Re: Wiki-recovering failed raid, overlay problem
From: Chris Finley <hidden>
Date: 2013-06-02 00:40:15
On Sat, Jun 1, 2013 at 4:30 PM, Phil Turmel [off-list ref] wrote:
Hi Chris, On 06/01/2013 02:23 AM, Chris Finley wrote:quoted
I am trying to recover a failed Raid 5 array by following the guide at https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAIDStop. Report the *critical* details of your setup. At least:
Thank you for the reply. Oh, yes. I'm the guy from an earlier post: http://marc.info/?l=linux-raid&m=136840333618808&w=2 The pastebins include mdadm -E and smartmontools output. I included them as attachments at your request. Two drives dropped out of the RAID 5 for what appears to be read errors. One partition (sde1) had a much lower event count. I was going to try forced assembly with the data from the other three drives (sdc, sdd and sdf). I have not tried to force assembly yet. sdd had quite a few read errors, so used ddrescue to copy the data from sdd to sdb. Thus, My new failed set would be sdb, sdc, sde. The low event count fourth drive is now sdd.
1) "mdadm -E /dev/sdXX" for every member device of the raid.
attached: raid.status.txt This is before attempting anything on the wiki.
2) "dmesg" or suitable portions of syslog, showing the last attempted assembly, the first failed assembly, the failure events that started your saga, and the last pre-failure assembly.
Not much from the last boot, but I do see ioctl in there. [ 0.218206] pnp 00:02: [dma 4] [ 0.570352] device-mapper: ioctl: 4.22.0-ioctl (2011-10-19) initialised: dm-devel@redhat.com [ 2.712503] device-mapper: dm-raid45: initialized v0.2594b [ 2.715231] md: linear personality registered for level -1 [ 2.717072] md: multipath personality registered for level -4 [ 2.722254] md: raid0 personality registered for level 0 [ 2.727948] md: raid1 personality registered for level 1 [ 2.861915] md: bind<sdb1> [ 11.676186] md: raid6 personality registered for level 6 [ 11.676188] md: raid5 personality registered for level 5 [ 11.676189] md: raid4 personality registered for level 4 [ 11.684692] md: raid10 personality registered for level 10 [ 11.842658] md: bind<sde1> [ 11.844580] md: bind<sdc1> [ 11.891707] md: bind<sdd1>
3) an account of all "mdadm" commands you've already used and their results.
only: mdadm -E mdadm --stop /dev/md0
4) an account of any other operations you've performed that might have written to the member disks.
bad-blocks -v on the last drive to drop out of the raid (sdd) Then ddrescue to move the data from sdd to sdb.
quoted
Things go fine until I get to the command under "Setup the loop-device and the overlay device:" parallel 'size=$(blockdev --getsize {}); loop=$(losetup -f --show -- overlay-{/}); echo 0 $size snapshot {} $loop P 8 | dmsetup create {/}' ::: $DEVICES This command gets me: device-mapper: reload ioctl failed: Device or resource busy Command failed device-mapper: reload ioctl failed: Device or resource busy Command failed device-mapper: reload ioctl failed: Device or resource busy Command failed device-mapper: reload ioctl failed: Device or resource busy Command failedYour array is probably still partially assembled. The wiki is lame. This mailing list is the right place to get help. (I'm rather biased against wikis for this sort of thing, but that's off-topic.)
You are probably right. After rebooting for several of the steps
(including drive addition/removal) it appears the OS tried to
reassemble the array. It does tell you that there is a failed array
and asks if you'd like to attempt to start the array anyway. It
appears that answering "No" still gets a partially assembled array. I
have stopped the array, but I'll wait for advise before attempting the
overlay again.
# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md0 : inactive sdd1[2](S) sdc1[0](S) sde1[3](S) sdb1[1](S)
7814047744 blocks
quoted
The drives are not mounted. I am booting to a system on sda. I tried this in single-user mode with the same result. I tried searching for dmsetup help without luck. Any advise on the cause of this error would be greatly appreciated. The overlays are created in my current directory at 2.1TB each: -rw-r--r-- 1 root root 2.1T May 30 21:23 overlay-sdb1 -rw-r--r-- 1 root root 2.1T May 30 21:23 overlay-sdc1 -rw-r--r-- 1 root root 2.1T May 30 21:23 overlay-sdd1 -rw-r--r-- 1 root root 2.1T May 30 21:23 overlay-sde1 The loop devices appear to be created: root@mythserver:~# losetup -a /dev/loop0: [0807]:58851784 (/root/overlay-sdb1) /dev/loop1: [0807]:58851786 (/root/overlay-sdc1) /dev/loop2: [0807]:58851787 (/root/overlay-sdd1) /dev/loop3: [0807]:58851792 (/root/overlay-sde1) These are the entries that are piped into 'dmsetup create {/}': 0 3907024002 snapshot /dev/sdb1 /dev/loop0 P 8 0 3907024002 snapshot /dev/sdc1 /dev/loop1 P 8 0 3907024002 snapshot /dev/sdd1 /dev/loop2 P 8 0 3907024002 snapshot /dev/sde1 /dev/loop3 P 8 Nothing has been created in /dev/mapper/ root@mythserver:~# l /dev/mapper/ total 0 crw------- 1 root root 10, 236 May 30 23:55 controlThese exercises to make overlays are rarely needed, and don't appear to have been created as intended. Please just round up the data requested and report back. (Paste text inline, or use plain text attachments, please.) We may want more data later (like smartctl reports), but items #1-#4 are needed now.
Because each of the drives had some read errors, I thought it would be safer to make the first attempt with overlays. There is always the possibility of me entering command incorrectly too :)
Phil
Attachments
- raid.status.txt [text/plain] 4161 bytes · preview
- smart_all.txt [text/plain] 34016 bytes · preview