The newer version of the aacraid driver seems to be malfunctioning.
First it couldn't find any drives at all, then I created some containers in the bios containing a single drive each, then it could find them... But after writing a few hundred MBs on a partition it begins spurting IO errors.
I/O error: dev 08:13, sector 0
EXT3-fs error (device sd(8,19)) in ext3_new_inode: IO failure
I/O error: dev 08:13, sector 0
I/O error: dev 08:13, sector 32768032
EXT3-fs error (device sd(8,19)): read_inode_bitmap: Cannot read inode bitmap - block_group = 125, inode_bi
tmap = 4096004
Anyone know whats going on?
Solved - ish
I've had the same problem with an Adaptec 2120S and Suse linux 9.1. We were getting about 30 MB per second continuous write (not great, but better than the 15MB per second with a 2.4 kernel) so decided to upgrade the firmwarwe to the latest and greatest. Did that and now we get 35MB per second (small hoorah), but completely unstable array with I/O errors all over the logs after writing a bit of data.
Adaptec have a newer version of the aacraid source on their site for download:
http://www.adaptec.com/worldwide/support/drivers_by_product.jsp?
sess=no&language=English+US&cat=%2FProduct%2FASR-2120S&prodkey=ASR-2120S#Linux
So grabbed that and diff with current kernel source (2.6.5-7.108-default) and found rather a lot of differences. First off I compiled the current aacraid kernel module from the latest kernel source from Suse. Installed it OK.
I then tried to compile the latest driver code from Adaptec and it failed to compile for several reasons, meanwhile....
the server is still running using the aacraid kernel module I had just build. I had noticed that my aacraid.ko was slightly larger than the aacraid.ko that came with the kernel-default-2.6.5-7.108.rpm.... so I did a few tests and got 37.5MB per second with a 64GB streamed write to a RAID5 array. I've had no I/O errors and have now done a bunch of continuous writes, mount and umount various partitions on the RAID5 array and all is still fine. I am a little surprised.
So I do not know what distro you are using, but you might want to go and recompiple the aacraid module :)
I'm about to report this as a bug to Suse...
no longer so solved.
OK, I must have gotten lucky yesterday with the server as now I do a bit of writing to the disks and:
Sep 9 09:52:39 tcmis kernel: aacraid: Host adapter reset request. SCSI hang ?
Sep 9 09:52:39 tcmis kernel: aacraid: Host adapter appears dead
Sep 9 09:52:39 tcmis kernel: scsi: Device offlined - not ready after error recovery: host 0 channel 0 id 0 lun 0
Sep 9 09:52:39 tcmis last message repeated 99 times
Sep 9 09:52:39 tcmis kernel: SCSI error : <0 0 0 0> return code = 0x6000000
Sep 9 09:52:39 tcmis kernel: end_request: I/O error, dev sda, sector 476263772
Sep 9 09:52:39 tcmis kernel: Buffer I/O error on device sda3, logical block 935884
Sep 9 09:52:39 tcmis kernel: lost page write due to I/O error on sda3
Sep 9 09:52:39 tcmis kernel: Buffer I/O error on device sda3, logical block 935885
Sep 9 09:52:39 tcmis kernel: lost page write due to I/O error on sda3
Sep 9 09:52:39 tcmis kernel: Buffer I/O error on device sda3, logical block 935886
Sep 9 09:52:39 tcmis kernel: lost page write due to I/O error on sda3
Sep 9 09:52:39 tcmis kernel: Buffer I/O error on device sda3, logical block 935887
Sep 9 09:52:39 tcmis kernel: lost page write due to I/O error on sda3
Sep 9 09:52:39 tcmis kernel: Buffer I/O error on device sda3, logical block 935888
Sep 9 09:52:39 tcmis kernel: lost page write due to I/O error on sda3
Sep 9 09:52:39 tcmis kernel: Buffer I/O error on device sda3, logical block 935889
Sep 9 09:52:39 tcmis kernel: lost page write due to I/O error on sda3
Sep 9 09:52:39 tcmis kernel: Buffer I/O error on device sda3, logical block 935890
Sep 9 09:52:39 tcmis kernel: lost page write due to I/O error on sda3
Sep 9 09:52:39 tcmis kernel: Buffer I/O error on device sda3, logical block 935891
Sep 9 09:52:39 tcmis kernel: lost page write due to I/O error on sda3
Sep 9 09:52:39 tcmis kernel: Buffer I/O error on device sda3, logical block 935892
Sep 9 09:52:40 tcmis kernel: lost page write due to I/O error on sda3
Looks like aacraid is very unstable after all.
Solved
Downgrade Adaptec 2120S firmware to version 6008 (released 06/08/2003):
6 Aug 2003 Adaptec SCSI RAID 2120S firmware upgrade version B6008
Performance is not so good as with latest firmware (7244), but at least the RAID5 array stays up without I/O errors.
unsolved, fedup
Nope, not solved. still plenty of errors with the Adaptec SCSI RAID card, even using the older firmware. I did quite a few test writes to the RAID array, which were all fine. I then copied the kernel source from the /usr/src/linux to the RAID array and it locked up.
I now have the fun of dealing with Adaptec Technical Support :)
Exact same problem here.
I'm having practically the same problem you described with the exact same card. My error message is slightly different but I'm pretty positive it is the aacraid drivers. This is the error I receive:
SCSI0 (0/0): rejecting I/O to offline device
Buffer I/O error on device sda5, logical block 13127
That repeats about 30 times and the whole system locks up. That logical block number is arbitrary for each crash.
Did a physical test on all drives and they checked out fine. All parts are brand new. The setup:
-3 Drive RAID5 w/ 1 Extra Hotswap
-Seagate Ultra 320 Cheetah drives (ST336607LC) w/ latest firmware (0007)
-Debian Testing (Sarge)
-2.6.8 Kernel
-1.1.3 aacraid drivers compiled in
Please let me know if you find anything significant out from Adaptec Tech Support. Plus it'll be helpful for anyone with the same problem who happens to browse by here in the future.
ST336607LC Firmware 0007
Hi Mike,
I need that Seagate Firmware 0007 indeed, could you pls help me.
Regards
Clement @ Borneo Island
c.lim@gcs.goldcoin.ws
ST336607LC Firmware 0007
Hi Clement,
Are you able to get this Seagate 0007 firmware? I also need it.
Best Regards
Thomas Au
aacraid bad ?
Hi,
I've the same problems with Adaptec's 2110S and SuSE 9.1, but my solutions was the kernel parameter acpi=off !
kehl@urz.uni-halle.de
We dont even have ACPI compil
We dont even have ACPI compiled in to our kernel and it still does it though.
The people we bought the server from were able to recreate the problem and are also in contact with Adaptec about this. If I hear any solutions or possible causes, I will post here about it.
We have been plagued with sam
We have been plagued with same/similar problems on multiple servers using aacraid drivers. This random I/O error crash/reboot cycle has forced us to migrate towards LSI raid controllers. I haven’t had a single hiccup with the megaraid driver yet. We tried all the things people have mentioned regarding disabling ACPI, disabled Hyperthreading etc.. etc.. Nothing. Look familiar?
After this, just a continual scroll of I/O errors.....
For reference: Adaptec 2120S 64 bit w/ latest firmware ,u320 seagate w/ latest drive firmware, dual 2.4ghz xeon, intel SE7501CW2 board, sc5250 chasis w/ Intel hotswap. Have talked to both Seagate and Adaptec...no progress after a years of correspondence.
P.S., my image was obviously
P.S., my image was obviously a console dump. The system logfiles are similar to above posters, such as:
"Sep 9 09:52:39 tcmis kernel: aacraid: Host adapter reset request. SCSI hang"
Crazyness.
Adaptec 2120S with battery
Hi After installing the backup battery module
i'm also getting these I/O Errors.
Without the battery module my system was running fine.
I running Debian Sarge ( testing ) with kernel 2.6.8
from debian source, custom compile, with base config from
the dell bootcd ( kernel 2.6.8 )
We didn't have a battery modu
We didn't have a battery module in ours and it was unstable. It seems like the crashes are completely random. Sometimes we could go for a day or so without a crash.
Disabling disconnect seems to help
Hi there,
I've been struggling with a similar issue this weekend. We have a Dell PERC RAID controller here (which is Adaptec based) and use aacraid.
The externally attached RAID had I/O errors (IDE disks started to fail) and the kernel timed out, taking the device offline.
Disabling "Disconnect" in the SCSI BIOS seemed to help - no SCSI hangs so far.
Bye, Tino.
Yep. Thats exactly what I see.
The people we bought the server from replaced the 2120S card with a 2110S card and it is working perfectly so far. It doesn't use the AACRAID drivers.
Last I heard from them was that they were working directly with engineers at Adaptec. Looks like this was the only solution in the end. I guess the 2120s is just unstable in linux currently.
same problem here with 2200s
I'm having the same problem with the Adaptec 2200s card and aacraid driver. I'm using the latest firmware (7244) and the aacraid 1.1.5 driver with RHAS 2.1 update 5. Please post any info you get from the Adaptec engineers!
Sorry I dont think I will be
Sorry I dont think I will be getting any further info on this. They decided to replace our card with the 2110S and things are working perfectly now.
The aacraid drivers are listed as experimental in Linux... I think your problem will be fixed when you get a new card or when someone gets around to testing/debugging those drivers in linux.
Do you have to recreate the volumes?
When moving from 2120s to 2110, did you have to wipe the RAID volumes, or can you just plug the 2110 and be ready to roll?
Well we backed everything up
Well we backed everything up before giving them the server. I told them they could wipe the drives if they wanted or needed to. I got the server back and it was a fresh install of Debian.
Not sure if they wiped them to be safe or if it was required. You're best bet is to try shooting a quick email to adaptec asking that same question.
Adaptec says you can't move from 2120s to 2110s
They said that "The raid info on the 2110 is written to the last 8K on the drive. The raid info on the 2120 is written to the first 64K. The 2120 can read, and convert the data from the 2110, but not the other way around.... "
2120 to 2110
Hi:
You cant move to a 2110 even if you wipe the drives and start all over again?
Thanks
Guido
2120 to 2110
You can wipe the drives and start over again. You just can't keep the data when moving from 2120 to 2110. BTW, I seem to be getting much better performance with a 2110. I'm using the i2o drivers.
Are you using i2o drivers?
Are you using the new i2o drivers for the 2110s, or the legacy driver? (If you're using the new driver, the device should be something like /dev/i2o/..., whereas the old driver will show it as /dev/sda).
Thanks!
adaptec 2120S
Hi :
I am trying to install SUSE 8.2 with the Adaptec 2120S and it is bombing, what version of SUSE did you install with the 2110 card. I must have the 8.2 version
Thanks
Guido
performance
Could you please give an indication of the write performance you get with megaraid and LSI:
time dd if=/dev/zero of=./8gb bs=1024k count=8192
thanks.
Just installed Suse 9.1 on a
Just installed Suse 9.1 on a 2120S w/Raid5 and 4x Seagate 36Gig w/REV7007 without problems. Without "acpi=off" it had I/O errors all the time.
Oh btw, Adaptec Firmware is B
Oh btw, Adaptec Firmware is B6011
I used the 2200s for several
I used the 2200s for several months with the B6011 firmware and had no problems. A couple weeks ago I started getting the scsi reset hang. I upgraded to the 7244 firmware and the 1.1.5 driver, but it didn't fix anything. Because it worked fine for me for a long time without a problem, I suspect it could be a hardware issue. However, so many other people seem to be having the same problem that I am not sure.
test with heavy IO
I downgraded my firmware to 6011, and the 1.1.4 driver but it is still unstable. It holds up much better, but I can still force it to reset the card by running a large rsync to the raid volume. I am testing again with acpi=off.
Anyone tried 2.6.9-bk5?
I see it marks the aacraid driver as no longer experimental. I myself can't try it - it clashes with the other patches I need.
AACRaid in 2.6 Kernel is plain broken
I experienced pretty much all the problem listed above with my 2120S/128MB card with Firmware 7244 with 2 HDD's in a RAID-1 housed in a Supermicro SAF-TE Compliant SCA Caddy with a Qlogic GEM-381 SAF-TE Controller. I found 2.4 kernel was pretty stable though I could still cause a ReiserFS panic occassionaly if I popped out 1 of the SCA hdd's to simulate a RAID failure. With the 2.6.6, .7, .8, .81, .9 & .10-RC1 kernels the system ALWAYS gets a ReiserFS panic if I pop 1 of the SCA drivers to simulate a RAID failure. I also got other Panics randomly but with the Supermicro SCA SAF-TE caddy I could reproduce the problem consistantly.
Though there is a upside to this little story, I emailed Mark @ Adaptec and he sent me driver source for 1.1-5[2362]. This compiled against 2.6.8.1 and works PERFECTLY. Haven't had a problem, uptime is now 40days, be lucky if I got 40 minutes before. Only problem with this driver is that is will NOT compile against 2.6.9 or 2.6.10-RC1.
Wish Marks driver could be included as an option in the Kernel. He's light years ahead of the default kernel driver in patches and improvements.
ciao
Re: AACRaid in 2.6 Kernel is plain broken
Dear all,
I'm facing the same problem, it hangs even if acpi is turned off - but I can't do all the things you mentioned because I'm installing a new System(Suse8.2, Kernel 2.4.2). Unfortunately this is our _only_ system which is to be replaced this weekend because or controller just faded away. Does anybody if you know how I could possibly put a correct driver module in the installation?
Thanks alot,
jan
How to break aacraid module
#!/bin/sh
total=1
while [ "$total" -le 20 ]
do
date >> /root/io_test.log
rm -rf /data1/*
count=1
while [ "$count" -le "40" ]
do
cp -r /usr/src/linux-2.6.4-52 /data1/thing$count
count=$((count+1))
done
total=$((total+1))
done
This tends to zap the aacraid module after about 10 minutes every time.
aacraid w/ 2.6.9
ya there is a aacraid module in 2.6.9 (part of source tree) i'm have the same problems w/ it..
recently, we just purchased a 2410SA sata raid adapter and after a few mins of being up and in use i get the same errors:
# dmesg
Buffer I/O error on device sda3, logical block 9
lost page write due to I/O error on sda3
scsi0 (0:0): rejecting I/O to offline device
...
from the looks of it this seems to be a problem w/ the aacraid driver (since the rest of you have similar or the same problems)
anyone try compiling the 1.1.5-2326 (off adaptecs site) against 2.6.9
?
In file included from /root/aacraid/linit.c:55:
drivers/scsi/hosts.h:1:2: warning: #warning "This file is obsolete, please use instead"
/root/aacraid/linit.c: In function `aac_eh_reset':
/root/aacraid/linit.c:898: warning: implicit declaration of function `scsi_sleep'
/root/aacraid/linit.c: At top level:
/root/aacraid/linit.c:1041: error: unknown field `highmem_io' specified in initializer
make[1]: *** [/root/aacraid/linit.o] Error 1
make: *** [_module_/root/aacraid] Error 2
i cant even compile it against 2.6.8* different errors however - mostly syntax errors (compiling wrong?)
i hope there's a solution soon i have 4 300 gig sata drives just sitting there collecting dust! =(
- dan
driver source for 1.1-5[2362]
Hi!
Are you talking about 1.1.5-2326, which is available on the Adaptec web page or is there a newer version available from Mark @ Adaptec personally only?
I tried to compile 1.1.5-2326 with 2.6.8.1 kernel but get an error:
drivers/scsi/aacraid/linit.c:1042: error: unknown field `highmem_io' specified in initializer
Maybe you can forward me the patches from Mark?
Tschoeee
Roland Rosenfeld
roland@spinnaker.de
Performance + solutions
Yes, I would like to know what aacraid source it was that worked.
I tried compiling the 1.1.5-2326 from Adaptec against a Suse9.1 kernel and it failed with the same problem.
Since I posted some time ago (the BIG error posts at the top) the Adaptec 2120S RAID card broke. Returned to Adaptec, who then sent us a new one.
All plugged in and still the same I/O problems.
I am now going to add to my open technical support query with Adaptec and ask them for a new card/code that works.
PERFORMANCE
===========
The whole reason we wanted to use Suse9.1 in the first place was that the performance of the 2.4 kernel with a continuous write going on was absolutely awful. We got the 2120S up to 15MB/sec, but interactive login was not interactive at all.
Suse9.1 Pro manages 40MB/sec, but is rather unstable when doing multiple file operations (takes about 10 minutes to kill and is very repeatable).
The simple performance test we use is:
time dd if=/dev/zero of=./8gb bs=1024k count=8192
So I put Windows XP on the intel box. Using Cygwin dd the write test gave 20MB/sec, although how direct this is to the hardware I do not know.
A freebie performance tester for HDDs gives 40MB/sec under windows for continuous writes to our RAID5 array. Gives 80MB/sec for bursts and a read of 115MB/sec
Therefore it does look like the accraid driver gives pretty good performance (although we are still not impressed with the card itself), except for the stability issues.
Other people on this thread have mentioned megaraid and LSI - it would be really helpful if they could post any performance data.
i.e. please run:
time dd if=/dev/zero of=./8gb bs=1024k count=8192
with an LSI card and the megraid thang. We would be very interested if you get 60MB/sec of above.
cheers.
Some Results
time dd if=/dev/zero of=./8gb bs=1024k count=8192
8192+0 records in
8192+0 records out
real 3m46.705s
user 0m0.019s
sys 0m20.522s
Did this using the 1.1.5 [2370] I just posted a link too, with a 2120/128MB in a 32bit PCI slot on a I865 chipset board with a P4 2.8Ghz Prescott and 2 x 10KRPM 36GB Seagate Cheetahs.
ciao
aacraid 1.1.5 [2370] source
Since I posted my comments about aacraid 1.1.5[2362] Mark has sent me 1.1.5[2370]. It compiles against 2.6.9. Hope he doesn't mind me posting this!
http://www.obvious.co.nz/aacraid-1.1.5-2370.tgz
ciao
aacraid 1.1.5 [2370] compiles fine - still has problems
Thanks!
this does compile fine against 2.6.9
i still have problems w/ my 2410SA card though .. it still hangs after a while (different error message now)
Nov 5 06:20:22 (none) user.err kernel: aacraid: Host adapter reset request. SCSI hang ?
Nov 5 06:21:05 (none) user.warn kernel: io_callback: io failed, status = 5
hangs any proccesses using the drive (4 disks on a raid 0)
anyone having similar issues?
(linux)
-- daniel
Only test 1.1.5 [2370] with SCSI RAID Controllers
Sorry, but I've only tested 2120S, 2200S, 2130S and 2230S SCSI RAID Controllers and they seem fine with firmware 7244 or higher and only with R1 arrays so far, not use any other type very much. You could try firmware 7244 on the 2410SA. I'll see if I can get a 2410SA or 2810SA to test.
ciao
firmware - 2410SA
I am using the firmware version 7244 on my 2410SA..
i was readong on linux-scsi kern dev list and saw that it's possible the firmware on the drive might be the issue (huh?)
i'm using:
A0391552 300GB 7200RPM DiamondMax 10 Serial ATA Internal Hard Drive
anyone else having problems w/ diamond max and adaptec sata raid?
maxtor has no updates for firmware on the sata drive (?)
our raid setup is 4 diamond max drives on a raid 0
after fdisk'ing and formating partitions it dies with the scsi hang error message... after a hardboot the adaptec BIOS shows the drives as 150 GB drives instead of 300GB they showed originally (weird?).. i then have to rebuild the array.... to show the drives as 300GB and start over..
anyone else having these sort of problems in addition to the aacraid driver saying their adaptec is hanging?
-daniel
syslog stuff (again) :
Nov 5 06:20:22 (none) user.err kernel: aacraid: Host adapter reset request. SCSI hang ?
Nov 5 06:21:05 (none) user.warn kernel: io_callback: io failed, status = 5
Source code posted for aacraid driver does compile, but fails.
Thanks for posting the source code. It did at least compile against the latest Suse 9.1Pro kernel, and the module loaded.
Mounted /dev/sda2 and then loaded it - failed in the same way as before I'm afraid after about 10 minutes. I certaily would not use that driver code at the moment.
So far Adaptec have told me that the drivers are not in Beta test, yet...
quote from Tech support at Adaptec:
"
The default aacraid module in the 2.6 kernel, before 2.6.8, work for
installation and light load but fail under heavy loads.
"
So far the only solution seems to be to downgrade to a 2110S or go a buy from LSI: Megaraid 320-1, although I do not know what the write performance of the LSI card is like and I am still hoping that someone on this thread will post a result with an LSI card.
cheers.
Possible Drive Issue
Hi
You might want to check out http://www.seagate.com/support/disc/u320_firmware.html
Might be a disk problem!
Cheers
2410SA no longer hangs under heavy load
You wouldn't believe the solution I used to fix the 'hangs under heavy load' issue on the 2410SA. :-) I hooked up a 4cm fan about 4 inches from the processor on the card. No more hangs! The processor really could use a heat sink!!! I don't think a fan would be required if it had a heat sink.
If anyone knows of any raid management tools that will work with aacraid on linux kernel 2.6.x, please let me know. I'll watch this space. Thanks!
Randy
Linux commandline raid tools aaccli
You can get the tools from the Adaptec CD that came with your card or from Adaptecs Web Site. Just download the RedHat RPM and delve into it and you should find the aaccli tool. There is also a MAKEDEV.aac script to create the /dev/aac0 device entry. There is pdf documentation for aaccli on the Adaptec web site.
There are also an X version of the Storage Manager gui tools (Java based) but I've never tried them.
ciao
Worked also for me
I've bougth the Adaptec 2410SA to have some redundancy on a mail server's storage... after some uptime (2 months) the storage has failed and I've sadly found 3 of 4 Disks containing bad data (controller's surface check fails after a few sectors), obiouvsly there was no way to reconstruct anything!
I've put the card in another PC for testing and started with tries...
After trying almost all drivers and firmwares I've finally tested the cooling fan solution on my 2410SA , I can't believe it...but it seems working.
I was not able to even mkfs on my raid 5 array of 4 150Gb SATA Drives, the whole thing was hanging in at least 2 minutes (I was testing the controller in a normal Mid-Tower case with a poor cooling).
Now I can try to switch back to newest firmware and drivers and stress-test the controller before taking it to production again.
I think Adaptec's engineers have to take in serious consideration the need of an Heatsink for this card's processor...or the eventuality of loosing many customers.
Many thanks for your suggestion, you've probably saved me from trashing the card.
Sauro.
2410SA working
Hi everyone,
finally my raid is working correctly w/ firmware 7244 and the default aacraid driver that came w/ 2.6.9. I sent an email too Mark Salyzyn about my problem and he suggested the following:
"The aacraid driver is far too thin to be at fault for these kinds of
conditions, it must be Hardware or Firmware. The later drivers try to
mitigate error recovery scenarios though ...
If the problem is occurring only with the 2410SA, as the reports
suggest, then I would be more willing to believe that we have an Adapter
*or* Drive Firmware issue.
Another possibility is the SATA cables, the earlier ones I used here
would need just a slight mechanical stress to drop the drive. The ones
shipped in the 2410SA kits (black ends, red cable) worked far better. A dropped drive connection (as opposed to a drive that is showing media failure and is communicative) can take longer than 60 seconds to recover and fail/degrade the associated array with the older Firmware CHIMs (7244 is older from my perspective) causing the adapter to be taken offline by the operating system. Make sure the connectors are under ideal conditions (remember, SATA is meant for internal tied and relieved connections, not for external drives with many connection/disconnections)."
So i switched to using the SATA cables that came with the 2410SA and it seems to be working fine... I also have write cache disabled... although it could be the write cache i'm leaning towards the cables...
-- Daniel
My Solution
Hi All, I had the same problem, my solution is kernel 2.4.28 and 1.1.5-9999 driver (downloated from Adaptec site). It look like server is working very stable. But i still searching solution for 2.6.x kernel.
Best Regards
Wojtek Kupiec
Some results on Sarge with 2.6.9
System has: 2 Xeon 2.8, 3G mem, _Adaptec_ 2120S + 5 FUJITSU MAS3735NC 73G
1. Make minimal netcd install _linux26_ ( some installs, fail with sda goes offline )
2. installing from .deb kernel 2.6.9 with compiled in aacraid-1.1.5[2370]
3. reboot
Result: work stable, all disk test passed fine.
BUT: what if one disk failed? Trying to remove one disk can lead to 2 result2.
1. System with 10 sec disk idle, still alive...
2. System stop disk acctivity, but look like everyone process wait for disk input/output.. in minute later "aacraid: Host adapter reset request. SCSI hang ?"
In any of two cases _afacli_, show status of recovery, disk status...
?? Any success story with removing disk on the fly??
I had same offline buffer err
I had same offline buffer error with 2410sa on a dell poweredge with 32bit pci slots, got a ibm netfinity with 64bit pci slot and just swap card and drives (x3 raptor 74gb) over. In the dell every time i tried compile (gentoo) it would crash after a while, in the netfinity it work fine, compiled for whole weekend and nothing. Thought i solved it with 32bit verses 64bit pci slots... wrong :(
Thinking it was solved i started a new install to get things how i wanted them, booted gentoo livecd and got same errors.
Now during reinstall i deleted old arrays and created new ones, i had turned write cache off on dell to try and cure problem, it did not but on new arrays i turned it back on again and get problem again.
Write cache may not be problem but it might be contributing factor??