Re: reconstruct raid superblock
From: Majed B. <hidden>
Date: 2009-12-17 18:07:34
Before you start rebuilding a new array, I suggest you install the smartmontools package and run smartctl -a /dev/sdx (on each disk) and make sure that there are no errors reported. You might fall into problems if your disks have bad sectors on them. If your disks don't have any test logs from before, you should run a long or offline test to make sure they're fully tested: smartctl -t offline /dev/sdx And you should configure smartd to monitor and run tests periodically. On Thu, Dec 17, 2009 at 7:17 PM, Carl Karsten [off-list ref] wrote:
On Thu, Dec 17, 2009 at 9:40 AM, Majed B. [off-list ref] wrote:quoted
I'm assuming you ran the command with the 2 external disks added to the array. One question before proceeding: When you removed these 2 externals, were there any changes on the array? Did you add/delete/modify any files or rename them?shutdown the box, unplugged drives, booted box.quoted
What do you mean the 2 externals have had mkfs run on them? Is this AFTER you removed the disks from the array? If so, they're useless now.That's what I figured.quoted
The names of the disks have changed and their names in the superblock are different than what udev is reporting them: sde now was named sdg sdf is sdf sdb is sdb sdc is sdc sdd is sdd According to the listing above, you have superblock info on: sdb, sdc, sdd, sde, sdf; 5 disks out of 7 -- one of which is a spare. sdb was a spare and according to other disks' info, it didn't resync so it has no useful data to aid in recovery. So you're left with 4 out of 6 disks + 1 spare. You have a chance of running the array in degraded mode using sde, sdc, sdd, sdf, assuming these disks are sane. Try running this command: mdadm -Af /dev/md0 /dev/sde /dev/sdc /dev/sdd /dev/sdfmdadm: forcing event count in /dev/sdf(1) from 97276 upto 580158 mdadm: /dev/md0 has been started with 4 drives (out of 6).quoted
then check: cat /proc/mdstatroot@dhcp128:~# cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid6 sdf[1] sde[5] sdd[3] sdc[2] 5860549632 blocks level 6, 64k chunk, algorithm 2 [6/4] [_UUU_U] unused devices: <none>quoted
If the remaining disks are sane, it should run the array in degraded mode. Hopefully.dmesg [31828.093953] md: md0 stopped. [31838.929607] md: bind<sdc> [31838.931455] md: bind<sdd> [31838.932073] md: bind<sde> [31838.932376] md: bind<sdf> [31838.973346] raid5: device sdf operational as raid disk 1 [31838.973349] raid5: device sde operational as raid disk 5 [31838.973351] raid5: device sdd operational as raid disk 3 [31838.973353] raid5: device sdc operational as raid disk 2 [31838.973787] raid5: allocated 6307kB for md0 [31838.974165] raid5: raid level 6 set md0 active with 4 out of 6 devices, algorithm 2 [31839.066014] RAID5 conf printout: [31839.066016] --- rd:6 wd:4 [31839.066018] disk 1, o:1, dev:sdf [31839.066020] disk 2, o:1, dev:sdc [31839.066022] disk 3, o:1, dev:sdd [31839.066024] disk 5, o:1, dev:sde [31839.066066] md0: detected capacity change from 0 to 6001202823168 [31839.066188] md0: p1 root@dhcp128:/media# fdisk -l /dev/md0 Disk /dev/md0: 6001.2 GB, 6001202823168 bytes 255 heads, 63 sectors/track, 729604 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x96af0591 Device Boot Start End Blocks Id System /dev/md0p1 1 182401 1465136001 83 Linux and now the bad news: mount /dev/md0p1 md0p1 mount: wrong fs type, bad option, bad superblock on /dev/md0p1 [32359.038796] raid5: Disk failure on sde, disabling device. [32359.038797] raid5: Operation continuing on 3 devices.quoted
If that doesn't work, I'd say you're better off scrapping & restoring your data back onto a new array rather than waste more time fiddling with superblocks.Yep. starting that now. This is exactly what I was expecting - very few things to try (like 1) and a very clear pass/fail test. Thanks for helping me get though this.quoted
On Thu, Dec 17, 2009 at 6:06 PM, Carl Karsten [off-list ref] wrote:quoted
I brought back the 2 externals, which have had mkfs run on them, but maybe the extra superblocks will help (doubt it, but couldn't hurt) root@dhcp128:/media# mdadm -E /dev/sd[a-z] mdadm: No md superblock detected on /dev/sda. /dev/sdb: Magic : a92b4efc Version : 00.90.00 UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b Creation Time : Wed Mar 25 21:04:08 2009 Raid Level : raid6 Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB) Array Size : 5860549632 (5589.06 GiB 6001.20 GB) Raid Devices : 6 Total Devices : 6 Preferred Minor : 0 Update Time : Tue Mar 31 23:08:02 2009 State : clean Active Devices : 5 Working Devices : 6 Failed Devices : 1 Spare Devices : 1 Checksum : a4fbb93a - correct Events : 8430 Chunk Size : 64K Number Major Minor RaidDevice State this 6 8 16 6 spare /dev/sdb 0 0 8 0 0 active sync /dev/sda 1 1 8 64 1 active sync /dev/sde 2 2 8 32 2 active sync /dev/sdc 3 3 8 48 3 active sync /dev/sdd 4 4 0 0 4 faulty removed 5 5 8 80 5 active sync /dev/sdf 6 6 8 16 6 spare /dev/sdb /dev/sdc: Magic : a92b4efc Version : 00.90.00 UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b Creation Time : Wed Mar 25 21:04:08 2009 Raid Level : raid6 Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB) Array Size : 5860549632 (5589.06 GiB 6001.20 GB) Raid Devices : 6 Total Devices : 4 Preferred Minor : 0 Update Time : Sun Jul 12 11:31:47 2009 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 2 Spare Devices : 0 Checksum : a59452db - correct Events : 580158 Chunk Size : 64K Number Major Minor RaidDevice State this 2 8 32 2 active sync /dev/sdc 0 0 8 0 0 active sync /dev/sda 1 1 0 0 1 faulty removed 2 2 8 32 2 active sync /dev/sdc 3 3 8 48 3 active sync /dev/sdd 4 4 0 0 4 faulty removed 5 5 8 96 5 active sync /dev/sdg /dev/sdd: Magic : a92b4efc Version : 00.90.00 UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b Creation Time : Wed Mar 25 21:04:08 2009 Raid Level : raid6 Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB) Array Size : 5860549632 (5589.06 GiB 6001.20 GB) Raid Devices : 6 Total Devices : 4 Preferred Minor : 0 Update Time : Sun Jul 12 11:31:47 2009 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 2 Spare Devices : 0 Checksum : a59452ed - correct Events : 580158 Chunk Size : 64K Number Major Minor RaidDevice State this 3 8 48 3 active sync /dev/sdd 0 0 8 0 0 active sync /dev/sda 1 1 0 0 1 faulty removed 2 2 8 32 2 active sync /dev/sdc 3 3 8 48 3 active sync /dev/sdd 4 4 0 0 4 faulty removed 5 5 8 96 5 active sync /dev/sdg /dev/sde: Magic : a92b4efc Version : 00.90.00 UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b Creation Time : Wed Mar 25 21:04:08 2009 Raid Level : raid6 Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB) Array Size : 5860549632 (5589.06 GiB 6001.20 GB) Raid Devices : 6 Total Devices : 4 Preferred Minor : 0 Update Time : Sun Jul 12 11:31:47 2009 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 2 Spare Devices : 0 Checksum : a5945321 - correct Events : 580158 Chunk Size : 64K Number Major Minor RaidDevice State this 5 8 96 5 active sync /dev/sdg 0 0 8 0 0 active sync /dev/sda 1 1 0 0 1 faulty removed 2 2 8 32 2 active sync /dev/sdc 3 3 8 48 3 active sync /dev/sdd 4 4 0 0 4 faulty removed 5 5 8 96 5 active sync /dev/sdg /dev/sdf: Magic : a92b4efc Version : 00.90.00 UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b Creation Time : Wed Mar 25 21:04:08 2009 Raid Level : raid6 Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB) Array Size : 5860549632 (5589.06 GiB 6001.20 GB) Raid Devices : 6 Total Devices : 5 Preferred Minor : 0 Update Time : Wed Apr 8 11:13:32 2009 State : clean Active Devices : 5 Working Devices : 5 Failed Devices : 1 Spare Devices : 0 Checksum : a5085415 - correct Events : 97276 Chunk Size : 64K Number Major Minor RaidDevice State this 1 8 80 1 active sync /dev/sdf 0 0 8 0 0 active sync /dev/sda 1 1 8 80 1 active sync /dev/sdf 2 2 8 32 2 active sync /dev/sdc 3 3 8 48 3 active sync /dev/sdd 4 4 0 0 4 faulty removed 5 5 8 96 5 active sync /dev/sdg mdadm: No md superblock detected on /dev/sdg. On Thu, Dec 17, 2009 at 8:39 AM, Majed B. [off-list ref] wrote:quoted
You can't copy and change bytes to identify disks. To check which disks belong to an array, do this: mdadm -E /dev/sd[a-z] The disks that you get info from belong to the existing array(s). In the first email you sent you included an examine output for one of the disks that listed another disk as a spare (sdb). The output of examine should shed more light. On Thu, Dec 17, 2009 at 5:15 PM, Carl Karsten [off-list ref] wrote:quoted
On Thu, Dec 17, 2009 at 4:35 AM, Majed B. [off-list ref] wrote:quoted
I have misread the information you've provided, so allow me to correct myself: You're running a RAID6 array, with 2 disks lost/failed. Any disk loss after that will cause data loss since you have no redundancy (2 disks died).right - but I am not sure if data loss has occurred, where data is the data being stored on the raid, not the raid metadata. My guess is I need to copy the raid superblock from one of the other disks (say sdb), find the byets that identify the disk and change from sdb to sda.quoted
I believe it's still possible to reassemble the array, but you only need to remove the MBR. See this page for information: http://www.cyberciti.biz/faq/linux-how-to-uninstall-grub/ dd if=/dev/null of=/dev/sdX bs=446 count=1 Before proceeding, provide the output of cat /proc/mdstatroot@dhcp128:~# cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] unused devices: <none>quoted
Is the array currently running degraded or is it suspended?um, not running, not sure I would call it suspended.quoted
What happened to the spare disk assigned?I don't understand.quoted
Did it finish resyncing before you installed grub on the wrong disk?I think so. I am fairly sure I could assemble the array before I installed grub.quoted
On Thu, Dec 17, 2009 at 8:21 AM, Majed B. [off-list ref] wrote:quoted
If your other disks are sane and you are able to run a degraded array, then you can remove grub using dd then re-add the disk to the array. To clear the first 1MB of the disk: dd if=/dev/zero of=/dev/sdx bs=1M count=1 Replace sdx with the disk name that has grub. On Dec 17, 2009 6:53 AM, "Carl Karsten" [off-list ref] wrote: I took over a box that had 1 ide boot drive, 6 sata raid drives (4 internal, 2 external.) I believe the 2 externals were redundant, so could be removed. so I did, and mkfs-ed them. then I installed ubuntu to the ide, and installed grub to sda, which turns out to be the first sata. which would be fine if the raid was on sda1, but it is on sda, and now the raid wont' assemble. no surprise, and I do have a backup of the data spread across 5 external drives. but before I abandon the array, I am wondering if I can fix it by recreating mdadm's metatdata on sda, given I have sd[bcd] to work with. any suggestions? root@dhcp128:~# mdadm --examine /dev/sd[abcd] mdadm: No md superblock detected on /dev/sda. /dev/sdb: Magic : a92b4efc Version : 00.90.00 UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b Creation Time : Wed Mar 25 21:04:08 2009 Raid Level : raid6 Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB) Array Size : 5860549632 (5589.06 GiB 6001.20 GB) Raid Devices : 6 Total Devices : 6 Preferred Minor : 0 Update Time : Tue Mar 31 23:08:02 2009 State : clean Active Devices : 5 Working Devices : 6 Failed Devices : 1 Spare Devices : 1 Checksum : a4fbb93a - correct Events : 8430 Chunk Size : 64K Number Major Minor RaidDevice State this 6 8 16 6 spare /dev/sdb 0 0 8 0 0 active sync /dev/sda 1 1 8 64 1 active sync /dev/sde 2 2 8 32 2 active sync /dev/sdc 3 3 8 48 3 active sync /dev/sdd 4 4 0 0 4 faulty removed 5 5 8 80 5 active sync 6 6 8 16 6 spare /dev/sdb /dev/sdc: Magic : a92b4efc Version : 00.90.00 UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b Creation Time : Wed Mar 25 21:04:08 2009 Raid Level : raid6 Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB) Array Size : 5860549632 (5589.06 GiB 6001.20 GB) Raid Devices : 6 Total Devices : 4 Preferred Minor : 0 Update Time : Sun Jul 12 11:31:47 2009 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 2 Spare Devices : 0 Checksum : a59452db - correct Events : 580158 Chunk Size : 64K Number Major Minor RaidDevice State this 2 8 32 2 active sync /dev/sdc 0 0 8 0 0 active sync /dev/sda 1 1 0 0 1 faulty removed 2 2 8 32 2 active sync /dev/sdc 3 3 8 48 3 active sync /dev/sdd 4 4 0 0 4 faulty removed 5 5 8 96 5 active sync /dev/sdd: Magic : a92b4efc Version : 00.90.00 UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b Creation Time : Wed Mar 25 21:04:08 2009 Raid Level : raid6 Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB) Array Size : 5860549632 (5589.06 GiB 6001.20 GB) Raid Devices : 6 Total Devices : 4 Preferred Minor : 0 Update Time : Sun Jul 12 11:31:47 2009 State : clean Active Devices : 4 Working Devices : 4 Failed Devices : 2 Spare Devices : 0 Checksum : a59452ed - correct Events : 580158 Chunk Size : 64K Number Major Minor RaidDevice State this 3 8 48 3 active sync /dev/sdd 0 0 8 0 0 active sync /dev/sda 1 1 0 0 1 faulty removed 2 2 8 32 2 active sync /dev/sdc 3 3 8 48 3 active sync /dev/sdd 4 4 0 0 4 faulty removed 5 5 8 96 5 active sync -- Carl K -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html-- Majed B.-- Carl K-- Majed B.-- Carl K-- Majed B. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html-- Carl K
--
Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html