Re: LibPATA code issues / 2.6.15.4
From: Mark Lord <hidden>
Date: 2006-02-26 14:04:24
Also in:
lkml
David Greaves wrote:
Mark Lord wrote:quoted
quoted
sdb: Current: sense key: Medium Error Additional sense: Unrecovered read error - auto reallocate failed end_request: I/O error, dev sdb, sector 398283329 raid1: Disk failure on sdb2, disabling device. Operation continuing on 1 devices
..
quoted
The command failing above is SCSI WRITE_10, which is being translated into ATA_CMD_WRITE_FUA_EXT by libata. This command fails -- unrecognized by the drive in question. But libata reports it (most incorrectly) as a "medium error", and the drive is taken out of service from its RAID. Bad, bad, and worse.
..
Thanks Mark I'm glad it's a bug and not bad hardware. I am quite concerned that the basic effect of just booting a practically vanilla 2.6.16-rc4 like this was to fry my raid array. Luckily it dropped 2 (of 3) disks so quickly that the event counter was the same allowing an easy rebuild. 2.6.15 has similar issues but they seem to happen *very* infrequently by comparison - this hit me several times during a single boot. Should Linus (cc'ed) hold off on 2.6.16 because of this or not?
Well, no doubt whatsoever about it being a "regression", since the FUA code is *new* in 2.6.16 (not present in 2.6.15). The FUA code should either get fixed, or removed from 2.6.16. Cheers