Re: Help with corrupted MDADM Raid6
From: ptschack . <hidden>
Date: 2014-06-14 11:19:57
Hi Neil, regrettably, I do not have logs from Jun 9th. This is what happened, in Detail: Before I grew the RAID, I made a backup of the system drive (Sometime around the beginning of may). Then I grew the RAID and the dm-crypt container on it. I then noticed that ext4 filesystems cannot be grown above a certain limit, which is why I decided to convert to BTRFS. Prior to Jun 9th I upgraded Ubuntu from 12.04 LTS to 14.04 LTS. The reason was that I wanted the newest BTRFS utils for the conversion. The conversion went smoothly, but the Ubuntu upgrade messed with some services running on the server (e.g. various configs for web apps, nothing to do with the raid). So I wanted to do a fresh install. I didn't do a backup of the system, because I had the old backup which had worked before. I attempted the fresh install, looking at the disks with GParted beforehand (as I said earlier, my theory is that GParted might have messed up some of the md superblocks). So after the fresh install, I wasn't able to start the RAID (error message was input/output error). So I thought I'll just restore the old backup, since that worked perfectly, and then make my way from there. After the restore, The system asked me if I wanted to start a degraded RAID. I thought it meant the raid was degraded because of the failing drive, and said yes. It then showed me a Raid with 6 Drives, all spares. At this point the panic started to set in :( I have attached some log excerpts from the beginning of may, before I made the backup and the old RAID was functioning (kern.log and syslog, grepped for 'md'). Furthermore, searching for the superblock with od gave me the following: od -x /dev/sdh | grep '4efc a92b' 20234525260 8a2a c251 a28b 2f92 f63e 8d72 4efc a92b 103362752200 4efc a92b 3412 ad92 b451 bc40 5897 d215 od -x /dev/sdi | grep '4efc a92b' 135674640060 4efc a92b 89de a9d8 d2b8 395e 6f37 4597 I don't think those are the superblocks, but rather the "magic number" being present somewhere on the drive :( Doing further research I found this: http://kevin.deldycke.com/2007/03/how-to-recover-a-raid-array-after-having-zero-ized-superblocks/ Is there any "safe" way to restore the superblocks, or is re-creating the RAID my final option? Thanks again, -P.
Well, you've definitely made progress. You've found 6 of the devices.
They all look consistent and it appears the array was completely coherent at
Mon Jun 9 21:52:48 2014
You think that the 7th device is dead or dying, so you just need to find 2
more (1 would do).
Presumably these are sdh and shi, but it is very strange that we cannot find
the superblock on either of them.
When was the last time the machine was rebooted prio to the date given -9th
Jun?
Do you have boot logs from that time? What lines contain 'md'??
Particularly "bind" lines will show you exactly which devices were included.
Maybe also try
od -x /dev/sdh | grep '4efc a92b'
If the superblock is at some strange location, that might find it.
NeilBrown Attachments
- syslog.txt [text/plain] 5792 bytes · preview
- kern.log [text/x-log] 5970 bytes · preview