Thread (4 messages) 4 messages, 3 authors, 2014-10-17

Re: I will pay money for the correct RAID recovery instructions

From: Ian Young <hidden>
Date: 2014-10-16 22:08:07

Ok, if I can pull this off I owe you a beer.

On Thu, Oct 16, 2014 at 1:22 PM, Robin Hill [off-list ref] wrote:
On Thu Oct 16, 2014 at 12:59:18pm -0700, Ian Young wrote:
quoted
I've been trying to fix a degraded array for a couple of months now
and it's getting frustrating enough that I'm willing to put a bounty
on the correct solution.  The array can start in a degraded state and
the data is accessible, so I know this is possible to fix.  Any
takers?  I'll bet someone could use some beer money or a contribution
to their web hosting costs.

Here's how the system is set up:  There are (6) 3 TB drives.  Each
drive has a BIOS boot partition.  The rest of the space on each drive
is a large GPT partition that is combined in a RAID 10 array.  On top
of the array there are four LVM volumes: /boot, /root, swap, and /srv.

Here's the problem:  /dev/sdf failed.  I replaced it but as it was
resyncing, read errors on /dev/sde kicked the new sdf out and made it
a spare.  The array is now in a precarious degraded state.  All it
would take for the entire array to fail is for /dev/sde to fail, and
it's already showing signs that it will.  I have tried forcing the
array to assemble using /dev/sd[abcde]2 and then forcing it to add
/dev/sdf2.  That still adds sdf2 as a spare.  I've tried "echo check >
/sys/block/md0/md/sync_action" but that finishes immediately and
changes nothing.
If sdf didn't finish syncing then it's no use adding it to the array as
anything other than a spare. Also, you can't run a check on a degraded
array (as there's nothing to check against), which is why that's
finishing immediately.

If sde is giving a read error during rebuild then the solution is to
stop the array (you'll need to do this via a bootable CD/USB stick I
guess) and use ddrescue to duplicate sde onto a new disk, The
read errors may well mean that some can't be copied (though ddrescue
will try very hard to do so), which may cause file/filesystem corruption
later. You can then reassemble the (degraded) array with the old sda-sdd
and the new sde, then add sdf and wait for the array to recover. You
can then run a fsck on the filesystem to check for any corruption there.
File corruption is a lot trickier to spot - if you have checksums for
the files then that's one way, otherwise you may be able to work out
what files are affected based on the offsets of the missing data (that's
rather beyond the limits of my knowledge though).

HTH,
    Robin
--
     ___
    ( ' }     |       Robin Hill        [off-list ref] |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help