Thread (9 messages) 9 messages, 2 authors, 2013-06-04

Re: Wiki-recovering failed raid, overlay problem

From: Chris Finley <hidden>
Date: 2013-06-02 00:40:15

On Sat, Jun 1, 2013 at 4:30 PM, Phil Turmel [off-list ref] wrote:
Hi Chris,

On 06/01/2013 02:23 AM, Chris Finley wrote:
quoted
I am trying to recover a failed Raid 5 array by following the guide at
https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID
Stop.  Report the *critical* details of your setup.  At least:
Thank you for the reply.

Oh, yes. I'm the guy from an earlier post:
http://marc.info/?l=linux-raid&m=136840333618808&w=2

The pastebins include mdadm -E and smartmontools output. I included
them as attachments at your request.

Two drives dropped out of the RAID 5 for what appears to be read
errors. One partition (sde1) had a much lower event count. I was going
to try forced assembly with the data from the other three drives (sdc,
sdd and sdf).

I have not tried to force assembly yet.

sdd had quite a few read errors, so used ddrescue to copy the data
from sdd to sdb. Thus, My new failed set would be sdb, sdc, sde. The
low event count fourth drive is now sdd.
1) "mdadm -E /dev/sdXX" for every member device of the raid.
attached: raid.status.txt
This is before attempting anything on the wiki.
2) "dmesg" or suitable portions of syslog, showing the last attempted
assembly, the first failed assembly, the failure events that started
your saga, and the last pre-failure assembly.
Not much from the last boot, but I do see ioctl in there.

[    0.218206] pnp 00:02: [dma 4]
[    0.570352] device-mapper: ioctl: 4.22.0-ioctl (2011-10-19)
initialised: dm-devel@redhat.com
[    2.712503] device-mapper: dm-raid45: initialized v0.2594b

[    2.715231] md: linear personality registered for level -1
[    2.717072] md: multipath personality registered for level -4
[    2.722254] md: raid0 personality registered for level 0
[    2.727948] md: raid1 personality registered for level 1
[    2.861915] md: bind<sdb1>
[   11.676186] md: raid6 personality registered for level 6
[   11.676188] md: raid5 personality registered for level 5
[   11.676189] md: raid4 personality registered for level 4
[   11.684692] md: raid10 personality registered for level 10
[   11.842658] md: bind<sde1>
[   11.844580] md: bind<sdc1>
[   11.891707] md: bind<sdd1>


3) an account of all "mdadm" commands you've already used and their results.
only:
mdadm -E
mdadm --stop /dev/md0
4) an account of any other operations you've performed that might have
written to the member disks.
bad-blocks -v on the last drive to drop out of the raid (sdd)
Then ddrescue to move the data from sdd to sdb.
quoted
Things go fine until I get to the command under "Setup the loop-device
and the overlay device:"

parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f --show --
overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup create {/}'
::: $DEVICES

This command gets me:
device-mapper: reload ioctl failed: Device or resource busy
Command failed
device-mapper: reload ioctl failed: Device or resource busy
Command failed
device-mapper: reload ioctl failed: Device or resource busy
Command failed
device-mapper: reload ioctl failed: Device or resource busy
Command failed
Your array is probably still partially assembled.  The wiki is lame.
This mailing list is the right place to get help.  (I'm rather biased
against wikis for this sort of thing, but that's off-topic.)
You are probably right. After rebooting for several of the steps
(including drive addition/removal) it appears the OS tried to
reassemble the array. It does tell you that there is a failed array
and asks if you'd like to attempt to start the array anyway. It
appears that answering "No" still gets a partially assembled array. I
have stopped the array, but I'll wait for advise before attempting the
overlay again.

# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md0 : inactive sdd1[2](S) sdc1[0](S) sde1[3](S) sdb1[1](S)
      7814047744 blocks


quoted
The drives are not mounted. I am booting to a system on sda. I tried
this in single-user mode with the same result. I tried searching for
dmsetup help without luck.

Any advise on the cause of this error would be greatly appreciated.

The overlays are created in my current directory at 2.1TB each:
-rw-r--r-- 1 root root 2.1T May 30 21:23 overlay-sdb1
-rw-r--r-- 1 root root 2.1T May 30 21:23 overlay-sdc1
-rw-r--r-- 1 root root 2.1T May 30 21:23 overlay-sdd1
-rw-r--r-- 1 root root 2.1T May 30 21:23 overlay-sde1

The loop devices appear to be created:
root@mythserver:~# losetup -a
/dev/loop0: [0807]:58851784 (/root/overlay-sdb1)
/dev/loop1: [0807]:58851786 (/root/overlay-sdc1)
/dev/loop2: [0807]:58851787 (/root/overlay-sdd1)
/dev/loop3: [0807]:58851792 (/root/overlay-sde1)

These are the entries that are piped into 'dmsetup create {/}':
0 3907024002 snapshot /dev/sdb1 /dev/loop0 P 8
0 3907024002 snapshot /dev/sdc1 /dev/loop1 P 8
0 3907024002 snapshot /dev/sdd1 /dev/loop2 P 8
0 3907024002 snapshot /dev/sde1 /dev/loop3 P 8

Nothing has been created in /dev/mapper/
root@mythserver:~# l /dev/mapper/
total 0
crw------- 1 root root 10, 236 May 30 23:55 control
These exercises to make overlays are rarely needed, and don't appear to
have been created as intended.

Please just round up the data requested and report back.  (Paste text
inline, or use plain text attachments, please.)

We may want more data later (like smartctl reports), but items #1-#4 are
needed now.
Because each of the drives had some read errors, I thought it would be
safer to make the first attempt with overlays. There is always the
possibility of me entering command incorrectly too :)

Phil

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help