Re: RAID5 with 2 drive failure at the same time
From: Robin Hill <hidden>
Date: 2013-01-31 22:10:07
On Thu Jan 31, 2013 at 10:46:17 -0700, Chris Murphy wrote:
On Jan 31, 2013, at 6:15 AM, Christoph Nelles [off-list ref] wrote:quoted
All drives are available again. And the seecond failed device reports UREs. I will run badblocks on that device before continuing. I attached the kernel logs of the first error and of the second error. I hope i filtered them reasonably.This looks like a write error, resulting in md immediately booting the drive. There's little point in using this drive again. Jan 28 00:23:36 router kernel: Write(16): 8a 00 00 00 00 01 36 b2 55 50 00 00 00 30 00 00 Jan 28 00:23:36 router kernel: end_request: I/O error, dev sdg, sector 5212624208
It's definitely a write error, yes. If there's nothing further back in the log (e.g. a read error that's caused a rewrite to take place) then this would definitely warn against the drive, but could just be a transient error (or a controller problem). If there is a read error further back then I'd blame it on timeout issues, with the drive still trying to complete the read operation while the kernel's timed out and trying to send a write.
What does smartctl -a return for this drive?quoted
Exactly. I am running badblocks on that device. SMART reports one "Pending Sector Count" :(I'm unclear on the efficacy of badblocks for testing. I'd use smartctl -t long and then -a to see if there are sector problems and at what LBA; and for removing bad blocks (force a remap) I'd use either dd zeros with e.g. bs=1M, or I'd use ATA Secure Erase which is faster.
I don't usually bother with read tests - as you say, they're not terribly useful. If the data's useful then just use ddrescue to get what you can, otherwise just write-test it. I usually do a full destructive badblocks test (I've found cases where zeros write fine but other patterns fail), followed by a long SMART test.
If you use the badblocks map when formatting a drive, e.g. using mkfs.ext4 -c, then it would allow you to use this disk but not in RAID. On top of raid, md gets the write error before the file system does, and boots the drive out of the array. Or on read error attempts to correct it. And even as a standalone drive do you really want to use a drive that can't remap future bad sectors?
Not a chance I'd use it if it's actually failing to remap bad sectors,
no. Only had that with one drive so far though (out of several hundred),
most get failed out after getting more than a handful of remapped
sectors.
Cheers,
Robin
--
___
( ' } | Robin Hill [off-list ref] |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" | Attachments
- (unnamed) [application/pgp-signature] 198 bytes