Thread (8 messages) 8 messages, 5 authors, 2006-04-01

Re: sata controllers status=0x51 { DriveReady SeekComplete Error } error=0x84 { DriveStatusError BadCRC }

From: Technomage <hidden>
Date: 2006-03-30 19:28:21

THANK YOU! :)

it appears this might be related to some of the errors a friend of mine got 
(ref: Re: recovering data on a failed raid-0 installation).

after a bit more research, it does appears that a kernel bug in combination 
with some "fast and loose" protocol usage on a laptop IDE interface may have 
been at fault.

more research on this forthcoming when a drive imager device arrives 
tomorrow....

******* error output
the error reported in his case was:

ata3 status = status 0x51 { DriveReady SeekComplete Error } 
0x40 {Unrecoverable Error } <repeated 5 times>

scsi error 2010 0x8000002 
return code = sdb current sense key medium error 
additional sense unrecovered read error 
auto realocate failed. 
end request i/o error /dev/sdb sector 22629482 
I/O error in filesystem  
md0 metadata device md0 block 0x29a1578 
xfs log mount recovery error failed error 5
xfs log mount failed mount i/o error .... kernel panic! .....

On Thursday 30 March 2006 10:26, you wrote:
Party line: It's a faulty cable (on both drives? triggered by rsync?
Doesn't show up under 'badblocks'? hah!)

Check out the linux-ide archive for my (and others) reports.

I've had lots of issues like this - spurious and IMHO incorrect error
messages. Only certain types of disk access cause them - xfs_repair and
rsync seem to tickle it.

With 2.6.15 I had lots of *very* scary moments with multiple disk
failures on a raid5 during xfs_repair.
I think it's down to the 'basic' error handling in the libata code and
certain disks/controllers being loose with the protocol. They then
identified problems in 'fua' (IIRC) handling which was pulled for 2.6.16.

2.6.16 seems to be much better (fewer 'odd' errors reported and md
doesn't mind)

David
PS Mitchell - you're still using Verizon and I still live off the edge
of their known world (in the UK) so I don't expect you'll get this reply
- hard luck my friend - get a better ISP!)

Mitchell Laks wrote:
quoted
Hi,

I have a production server in place at a remote site.
I have a single system drive that is an ide drive
and two data drives that are on a via SATA controller in a raid1
configuration.

I am monitoring the /var/log/messages and I get messages every few days

Mar 22 23:31:36 A1 kernel: ata6: status=0x51 { DriveReady SeekComplete
Error } Mar 22 23:31:36 A1 kernel: ata6: error=0x84 { DriveStatusError
BadCRC }

Mar 23 23:20:12 A1 kernel: ata5: status=0x51 { DriveReady SeekComplete
Error } Mar 23 23:20:12 A1 kernel: ata5: error=0x84 { DriveStatusError
BadCRC } Mar 23 23:32:03 A1 kernel: ata6: status=0x51 { DriveReady
SeekComplete Error } Mar 23 23:32:04 A1 kernel: ata6: error=0x84 {
DriveStatusError BadCRC }

Mar 24 23:22:45 A1 kernel: ata5: status=0x51 { DriveReady SeekComplete
Error } Mar 24 23:22:45 A1 kernel: ata5: error=0x84 { DriveStatusError
BadCRC }


Mar 27 23:16:57 A1 kernel: ata5: status=0x51 { DriveReady SeekComplete
Error } Mar 27 23:16:57 A1 kernel: ata5: error=0x84 { DriveStatusError
BadCRC }

Mar 28 23:10:16 A1 kernel: ata5: status=0x51 { DriveReady SeekComplete
Error } Mar 28 23:10:17 A1 kernel: ata5: error=0x84 { DriveStatusError
BadCRC } Mar 28 23:23:32 A1 kernel: ata6: status=0x51 { DriveReady
SeekComplete Error } Mar 28 23:23:32 A1 kernel: ata6: error=0x84 {
DriveStatusError BadCRC }


Mar 29 23:33:26 A1 kernel: ata6: status=0x51 { DriveReady SeekComplete
Error } Mar 29 23:33:26 A1 kernel: ata6: error=0x84 { DriveStatusError
BadCRC }

Interestingly by the logs I see that they have occured

March 1,2,3,8,14,17x3,20x4,21,22,23x2,24,27,28x2,29.

(x2 means two errors as in above example).

Also they occur during the activity of the cron job I do at 11pm to rsync
backup the sata drive raid 1 to another server.

here is the output of dmesg:


ata5: dev 0 cfg 49:2f00 82:746b 83:7f01 84:4023 85:7469 86:3c01 87:4023
88:407f
ata5: dev 0 ATA, max UDMA/133, 781422768 sectors: lba48
ata5: dev 0 configured for UDMA/133
scsi4 : sata_via
ata6: dev 0 cfg 49:2f00 82:746b 83:7f01 84:4023 85:7469 86:3c01 87:4023
88:407f
ata6: dev 0 ATA, max UDMA/133, 781422768 sectors: lba48
ata6: dev 0 configured for UDMA/133
scsi5 : sata_via
 Vendor: ATA       Model: WDC WD4000YR-01P  Rev: 01.0
 Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sda: 781422768 512-byte hdwr sectors (400088 MB)
SCSI device sda: drive cache: write back
SCSI device sda: 781422768 512-byte hdwr sectors (400088 MB)
SCSI device sda: drive cache: write back
/dev/scsi/host4/bus0/target0/lun0: p1
Attached scsi disk sda at scsi4, channel 0, id 0, lun 0
 Vendor: ATA       Model: WDC WD4000YR-01P  Rev: 01.0
 Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sdb: 781422768 512-byte hdwr sectors (400088 MB)
SCSI device sdb: drive cache: write back
SCSI device sdb: 781422768 512-byte hdwr sectors (400088 MB)
SCSI device sdb: drive cache: write back
/dev/scsi/host5/bus0/target0/lun0: p1
Attached scsi disk sdb at scsi5, channel 0, id 0, lun 0


Am I correct in assuming that the sata drives are giving me these errors,
and what shall I do? Could it possibly be a problem with the sata
controller rather than the drives?

me@A1:~$ cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda1[0] sdb1[1]
     390708736 blocks [2/2] [UU]

unused devices: <none>

I have done some testing with different sata controllers and recently
switched another server from the built in
sata controller on the A8v (via8237 controller) motherboard to  an add in
pci promise sata II150 card.

I think I have seen conflicts between the sata_via and sata_promise and I
already have a sata_promise card in the system for future expandability.

I am running the debian stock 2.6.12-1-386 kernel and debian sarge with
mdadm ii  mdadm          1.9.0-4sarge1  Manage MD devices aka Linux
Software Raid


1:/var/log# lsmod|grep sata
sata_via                8452  2
sata_promise            9988  0
libata                 44164  2 sata_via,sata_promise
scsi_mod              129096  4 sr_mod,sata_promise,libata,sd_mod

Thank you very much.

Mitchell
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help