Re: recovering RAID5 from multiple disk failures

From: Chris Murphy <hidden>
Date: 2013-02-03 00:39:33

On Feb 2, 2013, at 5:23 PM, Phil Turmel [off-list ref] wrote:

I do disagree.

The above, combined with:

quoted

I do know where the bad sectors are from the ddrescue report. We are
talking about less that 50kB bad data on disk1. Unfortunately, disk3
is worse, but there is no sector that is bad on both disks.

Leads me to recommend "mdadm --create --assume-clean" using the original
drives, taking care to specify the devices in the proper order (per
their "Raid Device" number in the --examine reports).  I still haven't
seen any data that definitively links specific serial numbers to
specific raid device numbers.  Please do that.

After re-creating the array, and setting all the drive timeouts to 7.0
seconds, issue a "check" scrub:

echo "check" >/sys/block/md0/md/sync_action

This should clean up the few pending sectors on disk #1 by
reconstruction from the others, and may very well do the same for disk #3.

If disk #3 gets kicked out at this point, assemble in degraded mode with
disk #2, #4, and a fresh copy of disk #1 (picking up the new superblock
and any fixes during the partial scrub).  Then "--add" a spare (wiped)
disk and let the array rebuild.

And grab your data.

OK I understand. This seems reasonable to me as well. It is very important to get *each* drive's SCT ERC's set before starting the check!

So basically disk1 being out of sync in this instance is considered minimal, and worth taking a chance on in order to avoid losing the 50kb of data affected by bad sectors; because they may be all the difference in easily getting the array up, mounted, and the data off the disk.


Chris Murphy

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help