RE: can i recover an all spare raid10 array ?
From: Roland RoLaNd <hidden>
Date: 2014-10-28 19:11:21
----------------------------------------
Date: Tue, 28 Oct 2014 18:34:22 +0000 From: robin@robinhill.me.uk To: r_o_l_a_n_d@hotmail.com CC: robin@robinhill.me.uk; linux-raid@vger.kernel.org Subject: Re: can i recover an all spare raid10 array ? Please don't top post, it makes conversations very difficult to follow. Responses should go at the bottom, or interleaved with the previous post if responding to particular points. I've moved your previous responses to keep the conversation flow straight. On Tue Oct 28, 2014 at 07:30:50PM +0200, Roland RoLaNd wrote:quoted
quoted
From: r_o_l_a_n_d@hotmail.com To: robin@robinhill.me.uk CC: linux-raid@vger.kernel.org Subject: Re: can i recover an all spare raid10 array ? Date: Tue, 28 Oct 2014 19:29:25 +0200quoted
Date: Tue, 28 Oct 2014 17:01:11 +0000 From: robin@robinhill.me.uk To: r_o_l_a_n_d@hotmail.com CC: linux-raid@vger.kernel.org Subject: Re: can i recover an all spare raid10 array ? On Tue Oct 28, 2014 at 06:22:11PM +0200, Roland RoLaNd wrote:quoted
I have two raid arrays on my system: raid1: /dev/sdd1 /dev/sdh1 raid10: /dev/sde1 /dev/sda1 /dev/sdf1 /dec/sdb1 /dev/sdc1 /dev/sdg1 two disks had bad sectors: sdd and sdf <<-- they both got hot swapped. i added sdf back to raid10 and recovery took place but adding sdd1 to raid1 proved to be troublesome as i didn't have anything important on '/' i formatted and installed ubuntu 14 on raid1 now system is up on raid 1, but raid10 (md127) is inactive cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md127 : inactive sde1[2](S) sdg1[8](S) sdc1[6](S) sdb1[5](S) sdf1[4](S) sda1[3](S) 17580804096 blocks super 1.2 md2 : active raid1 sdh4[0] sdd4[1] 2921839424 blocks super 1.2 [2/2] [UU] [==>..................] resync = 10.4% (304322368/2921839424) finish=672.5min speed=64861K/sec md1 : active raid1 sdh3[0] sdd3[1] 7996352 blocks super 1.2 [2/2] [UU] md0 : active raid1 sdh2[0] sdd2[1] 292544 blocks super 1.2 [2/2] [UU] unused devices: <none> if i try to assemble md127 mdadm --assemble /dev/md127 /dev/sde1 /dev/sda1 /dev/sdf1 /dev/sdb1 /dev/sdc1 /dev/sdg1 mdadm: /dev/sde1 is busy - skipping mdadm: /dev/sda1 is busy - skipping mdadm: /dev/sdf1 is busy - skipping mdadm: /dev/sdb1 is busy - skipping mdadm: /dev/sdc1 is busy - skipping mdadm: /dev/sdg1 is busy - skipping if i try to add one of the disks: mdadm --add /dev/md127 /dev/sdj1 mdadm: cannot get array info for /dev/md127 if i try: mdadm --stop /dev/md127 mdadm: stopped /dev/md127 then running: mdadm --assemble /dev/md127 /dev/sde1 /dev/sda1 /dev/sdf1 /dev/sdb1 /dev/sdc1 /dev/sdg1 returns: assembled from 5 drives and 1 rebuilding - not enough to start the array what does it mean ? is my data lost ? if i examine one of the md127 raid 10 array disks it shows this: mdadm --examine /dev/sde1 /dev/sde1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : ab90d4c8:41a55e14:635025cc:28f0ee76 Name : ubuntu:data (local to host ubuntu) Creation Time : Sat May 10 21:54:56 2014 Raid Level : raid10 Raid Devices : 8 Avail Dev Size : 5860268032 (2794.39 GiB 3000.46 GB) Array Size : 11720534016 (11177.57 GiB 12001.83 GB) Used Dev Size : 5860267008 (2794.39 GiB 3000.46 GB) Data Offset : 262144 sectors Super Offset : 8 sectors State : clean Device UUID : a2a5db61:bd79f0ae:99d97f17:21c4a619 Update Time : Tue Oct 28 10:07:18 2014 Checksum : 409deeb4 - correct Events : 8655 Layout : near=2 Chunk Size : 512K Device Role : Active device 2 Array State : AAAAAAAA ('A' == active, '.' == missing) Used Dev Size : 5860267008 (2794.39 GiB 3000.46 GB) <<--- does this mean i still have my data ? the remaining two disks: mdadm --examine /dev/sdj1 mdadm: No md superblock detected on /dev/sdj1. mdadm --examine /dev/sdi1 mdadm: No md superblock detected on /dev/sdi1.The --examine output indicates the RAID10 array was 8 members, not 6. As it stands, you are missing two array members (presumably a mirrored pair as mdadm won't start the array). Without these you're missing 512K of every 2M in the array, so your data is toast (well, with a lot of effort you may recover some files under 1.5M in size). Were you expecting sdi1 and sdj1 to have been part of the original RAID10 array? Have you removed the superblocks from them at any point? For completeness, what mdadm and kernel versions are you running? Cheers, RobinThanks for pitching in.here are the responses to you questions: - yes i expected both of them to be part of the array though one of them was just added to the array and didnt finish recovering when raid1 "/" crashedAccording to your --examine earlier, the RAID10 rebuild had completed (it shows the array clean and having all disks active). Are you certain that the new RAID1 array isn't using disks that used to be part of the RAID10 array? Regardless, I'd expect the disks to have a superblock if they were part of either array (unless they've been repartitioned?).
the examine earlier was to one of the 6 disks that belong to the current inactive array.. they're all clean as for raid1/10 arrays, that's what i thought as it happened with me before, but lsblk shows the following: NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 2.7T 0 disk └─sda1 8:1 0 2.7T 0 part sdb 8:16 0 2.7T 0 disk └─sdb1 8:17 0 2.7T 0 part sdc 8:32 0 2.7T 0 disk └─sdc1 8:33 0 2.7T 0 part sdd 8:48 0 2.7T 0 disk ├─sdd1 8:49 0 1M 0 part ├─sdd2 8:50 0 286M 0 part │ └─md0 9:0 0 285.7M 0 raid1 /boot ├─sdd3 8:51 0 7.6G 0 part │ └─md1 9:1 0 7.6G 0 raid1 [SWAP] └─sdd4 8:52 0 2.7T 0 part └─md2 9:2 0 2.7T 0 raid1 / sde 8:64 0 2.7T 0 disk └─sde1 8:65 0 2.7T 0 part sdf 8:80 0 2.7T 0 disk └─sdf1 8:81 0 2.7T 0 part sdg 8:96 0 2.7T 0 disk └─sdg1 8:97 0 2.7T 0 part sdh 8:112 0 2.7T 0 disk ├─sdh1 8:113 0 1M 0 part ├─sdh2 8:114 0 286M 0 part │ └─md0 9:0 0 285.7M 0 raid1 /boot ├─sdh3 8:115 0 7.6G 0 part │ └─md1 9:1 0 7.6G 0 raid1 [SWAP] └─sdh4 8:116 0 2.7T 0 part └─md2 9:2 0 2.7T 0 raid1 / sdi 8:128 0 2.7T 0 disk └─sdi1 8:129 0 2.7T 0 part sdj 8:144 0 2.7T 0 disk └─sdj1 8:145 0 2.7T 0 part
quoted
quoted
- i have not removed their superblocks or at least not in a way that i amaware of - mdadm: 3.2.5-5ubuntu4.1 - uname -a: 3.13.0-24-genericThat's a pretty old mdadm version, but I don't see anything in the change logs that looks relevant. Others may be more familiar with issues though.
that's the latest in my current ubuntu repository
quoted
quoted
PS: I just followed this recovery page: https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID I managed to reach the last step, whenever i tried to mount it kept asking me for the right file systemThat's good documentation anyway. As long as you stick to the overlay devices your original data is untouched. It's amazing how many people run --create on their original disks and lose any chance of getting the data back.
unfortunately i used to be/am one of those people. had bad experiences with this before, so i took it slow and went with the overlay documentation. all ebooks i could found about raid speak about different between multiple raid levels but none are thorough when it comes to setting up/troubleshooting raid. and once i do fix my issue, i move on to the next firefighting situation so i lose interest due to lack of time.
quoted
Correction:i couldn't force assemble the read devices so i issued instead: mdadm --create /dev/md089 --assume-clean --level=10 --verbose --raid-devices=8 missing /dev/dm-1 /dev/dm-0 /dev/dm-5 /dev/dm-3 /dev/dm-2 missing /dev/dm-4 which got it into degraded stateWhat error did you get when you tried to force assemble (both from mdadm and anything reported via dmesg)? The device order you're using would suggest that the missing disks wouldn't be mirrors of each other, so the data should be okay.
mdadm --assemble --force /dev/md100 $OVERLAYS mdadm: /dev/md100 assembled from 5 drives and 1 rebuilding - not enough to start the array. dmesg: [ 6025.573964] md: md100 stopped. [ 6025.595810] md: bind<dm-0> [ 6025.596086] md: bind<dm-5> [ 6025.596364] md: bind<dm-2> [ 6025.596612] md: bind<dm-1> [ 6025.596840] md: bind<dm-4> [ 6025.597026] md: bind<dm-3>
Can you post the --examine results for all the RAID members? Both for the original partitions and for the overlay devices after you recreated the array. There may be differences in data offset, etc. which will break the filesystem.
Original partitions: http://pastebin.com/nHCxidvE overlay: http://pastebin.com/eva4cnu6
Cheers, Robin
-- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html