Re: RAID5 with 2 drive failure at the same time
From: Robin Hill <hidden>
Date: 2013-02-01 13:34:55
On Thu Jan 31, 2013 at 03:40:00PM -0700, Chris Murphy wrote:
On Jan 31, 2013, at 3:10 PM, Robin Hill [off-list ref] wrote:quoted
If there is a read error further back then I'd blame it on timeout issues, with the drive still trying to complete the read operation while the kernel's timed out and trying to send a write.I think we need the whole log for the time before the start of the error1.txt file provided previously. And also I'd like to know which /dev/ device was the first to have a problem, that instigated the rebuild. And if during the rebuild if the file system was mounted rw, and if any writes were done at all. If so, that probably nixes --assume-clean. If it was rebuilding and not written to from the file system, the disk being rebuilt shouldn't actually be out of sync with the array state.
The timestamps on the logs show that sdg was the first to have a problem. It'd also be useful to know whether sdg has been rewritten at all since then (i.e. whether the testing was destructive or not), and whether or not the array was written to at all since the failure of sdg.
The disk that needs spot sector repairs is the one with UREs, I think that's sdj1. If that disk is dd'd to another disk, the new disk won't produce UREs for sectors missing data, and the chunks comprised of those sectors won't get rebuilt by md. So the disk to possibly dd to another is the one with the write error, sdg1. But only if the idea is to not use --assume-clean. That way a reassemble can rebuild, and not encounter another write error on that drive.
Yes, if sdg still contains valid array data (and the array wasn't written since then) then it would definitely make more sense to recreate the array using it, leaving sdj out for now. That'll require more work checking mdadm versions and data offset values though. That'll avoid the issues with the unreadable blocks on sdj.
quoted
Not a chance I'd use it if it's actually failing to remap bad sectors, no. Only had that with one drive so far though (out of several hundred), most get failed out after getting more than a handful of remapped sectors.I think I see a use case for badblocks destructive writes if the disk doesn't support enhanced secure erase (which writes a pattern not just zeros). Of on laptops where it's not possible to get a disk to reset on sleep, allowing it to be unfrozen for the purposes of using secure erase. But if available, secure erase is faster and wipes all sectors even those without LBAs. For sure with SSDs it's what should be used.
I prefer badblocks myself - I can see exactly what it's doing and what
errors are seen. With secure erase you're dependent on the firmware
internals to tell you what's actually going on (and, depending on the
nature of the errors you're getting, this may already be suspect).
Cheers,
Robin
--
___
( ' } | Robin Hill [off-list ref] |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" | Attachments
- (unnamed) [application/pgp-signature] 198 bytes