Thread (8 messages) 8 messages, 2 authors, 2014-06-14

Re: Help with corrupted MDADM Raid6

From: ptschack . <hidden>
Date: 2014-06-14 11:19:57

Hi Neil,

regrettably, I do not have logs from Jun 9th. This is what happened, in Detail:

Before I grew the RAID, I made a backup of the system drive (Sometime
around the beginning of may). Then I grew the RAID and the dm-crypt
container on it.
I then noticed that ext4 filesystems cannot be grown above a certain
limit, which is why I decided to convert to BTRFS.
Prior to Jun 9th I upgraded Ubuntu from 12.04 LTS to 14.04 LTS. The
reason was that I wanted the newest BTRFS utils for the conversion.
The conversion went smoothly, but the Ubuntu upgrade messed with some
services running on the server (e.g. various configs for web apps,
nothing to do with the raid). So I wanted to do a fresh install. I
didn't do a backup of the system, because I had the old backup which
had worked before.

I attempted the fresh install, looking at the disks with GParted
beforehand (as I said earlier, my theory is that GParted might have
messed up some of the md superblocks).
So after the fresh install, I wasn't able to start the RAID (error
message was input/output error).
So I thought I'll just restore the old backup, since that worked
perfectly, and then make my way from there.

After the restore, The system asked me if I wanted to start a degraded
RAID. I thought it meant the raid was degraded because of the failing
drive, and said yes.
It then showed me a Raid with 6 Drives, all spares. At this point the
panic started to set in :(

I have attached some log excerpts from the beginning of may, before I
made the backup and the old RAID was functioning (kern.log and syslog,
grepped for 'md').

Furthermore, searching for the superblock with od gave me the following:

od -x /dev/sdh | grep '4efc a92b'

20234525260 8a2a c251 a28b 2f92 f63e 8d72 4efc a92b
103362752200 4efc a92b 3412 ad92 b451 bc40 5897 d215

od -x /dev/sdi | grep '4efc a92b'

135674640060 4efc a92b 89de a9d8 d2b8 395e 6f37 4597

I don't think those are the superblocks, but rather the "magic number"
being present somewhere on the drive :(

Doing further research I found this:
http://kevin.deldycke.com/2007/03/how-to-recover-a-raid-array-after-having-zero-ized-superblocks/

Is there any "safe" way to restore the superblocks, or is re-creating
the RAID my final option?

Thanks again,
-P.
Well, you've definitely made progress.  You've found 6 of the devices.
They all look consistent and it appears the array was completely coherent at
    Mon Jun  9 21:52:48 2014

You think that the 7th device is dead or dying, so you just need to find 2
more (1 would do).

Presumably these are sdh and shi, but it is very strange that we cannot find
the superblock on either of them.
When was the last time the machine was rebooted prio to the date given  -9th
Jun?
Do you have boot logs from that time?  What lines contain 'md'??
Particularly "bind" lines will show you exactly which devices were included.

Maybe also try

  od -x /dev/sdh | grep '4efc a92b'

If the superblock is at some strange location, that might find it.

NeilBrown

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help