raid10 messed up by false multipath setup (was: raid10 messed up filesystem, lvm lv ok)
From: Ask Bjørn Hansen <hidden>
Date: 2008-01-21 10:32:08
On Jan 19, 2008, at 3:44 AM, Ask Bjørn Hansen wrote: Replying to myself with an update, mostly for the sake of the archives (I went through the linux-raid mail from the last year yesterday while waiting for my raw-partition backups to finish).
I mentioned[1] my trouble with the multipath detection code on the Fedora rescue mode messing up my raid yesterday.
[...]
I suspect that maybe the layout of the md device got messed up? How can I find out if that's the case? Would it be possible to recover from (assuming all the data still is on some of the disks).
I realized that of the 11 disks (9 in the raid, 2 spares) one of the disks affected by the "fake multipath" mishap was a spare, so after backing up all the raw partitions[2] I re-created the raid in place with the other affected disk marked missing and it seems like the file system is more or less okay. Yes, I'm doing a backup now. :-) Lessons: 1) Do backups of your raid'ed data. Yes, it can be a pain but figure it out. 2) Keep your root partition on a simple raid1 (or on a lvm group that's on a simple raid1). 3) When the raid goes @#$ - don't panic, make sure nothing is being written to the disks and stop. (Some years ago I lost a raid5 to the "oops, had a read-error, drop the last disk" issue and I suspect I could have saved it had I been patient and stopped working on it until I was more awake). 4) Have/make copies of the mdadm -D / -E output. 5) If you care about the data, do a backup of your raw partitions before trying to restore. 6) The "create the raid on top of the old raid" trick saves the day again (for a while I had some kind of cabling problem on a box with a raid6 - I lost track of how many times I did the recreate thing).
Secondary question: I'm doing a "dd if=/dev/sdX5 bs=256k > /backup/ sdX5" for each disk -- is there a way to run mdadm on the copies and experiment on those? (It took ~forever to copy a terabyte of the raw partitions).
(For the archives) - I didn't try it, but setting up the disk images as loop devices should work. I didn't think of that yesterday. - ask
[1] http://marc.info/?l=linux-raid&m=120065542429935&w=2
[2] And oh man am I glad I backed them up. On my first attempt at recreating the raid I forgot the md device parameter and --assume- clean, so it created a raid device on one of my source partitions and immediately started syncing at ~120MB/sec. Restoring the partitions from the backup worked fine fortunately. -- http://develooper.com/ - http://askask.com/ - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html