Re: On RAID5 read error during syncing - array .A.A
From: Robin Hill <hidden>
Date: 2014-12-08 09:48:41
On Sat Dec 06, 2014 at 03:49:10PM -0500, Emery Guevremont wrote:
On Sat, Dec 6, 2014 at 1:56 PM, Robin Hill [off-list ref] wrote:quoted
On Sat Dec 06, 2014 at 01:35:50pm -0500, Emery Guevremont wrote:quoted
The long story and what I've done. /dev/md0 is assembled with 4 drives /dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3 2 weeks ago, mdadm marked /dev/sda3 as failed. cat /proc/mdstat showed _UUU. smarctl also confirmed that the drive was dying. So I shutdown the server and until I received a replacement drive. This week, I replaced the dying drive with my new drive. Booted into single user mode and did this: mdadm --manage /dev/md0 --add /dev/sda3 a cat of /proc/mdstat confirmed the resyncing process. The last time I checked it was up to 11%. After a few minutes later, I noticed that the syncing stopped. A read error message on /dev/sdd3 (have a pic of it if interested) appear on the console. It appears that /dev/sdd3 might be going bad. A cat /proc/mdstat showed _U_U. Now I panic, and decide to leave everything as is and to go to bed. The next day, I shutdown the server and reboot with a live usb distro (Ubuntu rescue remix). After booting into the live distro, a cat /proc/mdstat showed that my /dev/md0 was detected but all drives had an (S) next to it. i.e. /dev/sda3 (S)... Naturally I don't like the looks of this. I ran ddrescue to copy /dev/sdd onto my new replacement disk (/dev/sda). Everything, worked, ddrescue got only one read error, but was eventually able to read the bad sector on a retry. I followed up by also cloning with ddrescue, sdb and sdc. So now I have cloned copies of sdb, sdc and sdd to work with. Currently running mdadm --assemble --scan, will activate my array, but all drives are added as spares. Running mdadm --examine on each drives, shows the same Array UUID number, but the Raid Devices is 0 and raid level is -unknown- for some reason. The rest seems fine and makes sense. I believe I could re-assemble my array if I could define the raid level and raid devices. I wanted to know if there are a way to restore my superblocks from the examine command I ran at the beginning? If not, what mdadm create command should I run? Also please let me know if drive ordering is important, and how I can determine this with the examine output I'll got? Thank you.Have you tried --assemble --force? You'll need to make sure the array's stopped first, but that's the usual way to get the array back up and running in that sort of situation. If that doesn't work, stop the array again and post: - the output from mdadm --assemble --force --verbose /dev/md0 /dev/sd[bcd]3 - any dmesg output corresponding with the above - --examine output for all disks - kernel and mdadm versions Good luck, Robin
You'll see from the examine output, raid level and devices aren't
defined and notice the role of each drives. The examine output (I
attached 4 files) that I took right after the read error during the
synching process seems to show a more accurate superblock. Here's also
the output of mdadm --detail /dev/md0 that I took when I got the first
error:
ARRAY /dev/md/0 metadata=1.2 UUID=cf9db8fa:0c2bb553:46865912:704cceae
name=runts:0
spares=1
Here's the output of how things currently are:
mdadm --assemble --force /dev/md127 /dev/sdb3 /dev/sdc3 /dev/sdd3
mdadm: /dev/md127 assembled from 0 drives and 3 spares - not enough to
start the array.
dmesg
[27903.423895] md: md127 stopped.
[27903.434327] md: bind<sdc3>
[27903.434767] md: bind<sdd3>
[27903.434963] md: bind<sdb3>
cat /proc/mdstat
root@ubuntu:~# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [linear] [multipath] [raid0]
[raid1] [raid10]
md127 : inactive sdb3[4](S) sdd3[0](S) sdc3[5](S)
5858387208 blocks super 1.2
mdadm --examine /dev/sd[bcd]3
/dev/sdb3:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : cf9db8fa:0c2bb553:46865912:704cceae
Name : runts:0
Creation Time : Tue Jul 26 03:27:39 2011
Raid Level : -unknown-
Raid Devices : 0
Avail Dev Size : 3905591472 (1862.33 GiB 1999.66 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : b2bf0462:e0722254:0e233a72:aa5df4da
Update Time : Sat Dec 6 12:46:40 2014
Checksum : 5e8cfc9a - correct
Events : 1
Device Role : spare
Array State : ('A' == active, '.' == missing)
/dev/sdc3:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : cf9db8fa:0c2bb553:46865912:704cceae
Name : runts:0
Creation Time : Tue Jul 26 03:27:39 2011
Raid Level : -unknown-
Raid Devices : 0
Avail Dev Size : 3905591472 (1862.33 GiB 1999.66 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : 390bd4a2:07a28c01:528ed41e:a9d0fcf0
Update Time : Sat Dec 6 12:46:40 2014
Checksum : f69518c - correct
Events : 1
Device Role : spare
Array State : ('A' == active, '.' == missing)
/dev/sdd3:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : cf9db8fa:0c2bb553:46865912:704cceae
Name : runts:0
Creation Time : Tue Jul 26 03:27:39 2011
Raid Level : -unknown-
Raid Devices : 0
Avail Dev Size : 3905591472 (1862.33 GiB 1999.66 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : 92589cc2:9d5ed86c:1467efc2:2e6b7f09
Update Time : Sat Dec 6 12:46:40 2014
Checksum : 571ad2bd - correct
Events : 1
Device Role : spare
Array State : ('A' == active, '.' == missing)
and finally kernel and mdadm versions:
uname -a
Linux ubuntu 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:41:14 UTC
2012 i686 i686 i386 GNU/Linux
mdadm -V
mdadm - v3.2.3 - 23rd December 2011The missing data looks similar to a bug fixed a couple of years ago (http://neil.brown.name/blog/20120615073245), though the kernel versions don't match and the missing data is somewhat different - it may be that the relevant patches were backported to the vendor kernel you're using. With that data missing there's no way to assemble though, so a re-create is required in this case (it's a last resort, but I don't see any other option).
/dev/sda3:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : cf9db8fa:0c2bb553:46865912:704cceae
Name : runts:0 (local to host runts)
Creation Time : Mon Jul 25 23:27:39 2011
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3905591472 (1862.33 GiB 1999.66 GB)
Array Size : 5858385408 (5586.99 GiB 5998.99 GB)
Used Dev Size : 3905590272 (1862.33 GiB 1999.66 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : b2bf0462:e0722254:0e233a72:aa5df4da
Update Time : Tue Dec 2 23:15:37 2014
Checksum : 5ed5b898 - correct
Events : 3925676
Layout : left-symmetric
Chunk Size : 512K
Device Role : spare
Array State : A.A. ('A' == active, '.' == missing)/dev/sdb3:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : cf9db8fa:0c2bb553:46865912:704cceae
Name : runts:0 (local to host runts)
Creation Time : Mon Jul 25 23:27:39 2011
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3905591472 (1862.33 GiB 1999.66 GB)
Array Size : 5858385408 (5586.99 GiB 5998.99 GB)
Used Dev Size : 3905590272 (1862.33 GiB 1999.66 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 92589cc2:9d5ed86c:1467efc2:2e6b7f09
Update Time : Tue Dec 2 23:15:37 2014
Checksum : 57638ebb - correct
Events : 3925676
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 0
Array State : A.A. ('A' == active, '.' == missing)/dev/sdc3:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : cf9db8fa:0c2bb553:46865912:704cceae
Name : runts:0 (local to host runts)
Creation Time : Mon Jul 25 23:27:39 2011
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3905591472 (1862.33 GiB 1999.66 GB)
Array Size : 5858385408 (5586.99 GiB 5998.99 GB)
Used Dev Size : 3905590272 (1862.33 GiB 1999.66 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 390bd4a2:07a28c01:528ed41e:a9d0fcf0
Update Time : Tue Dec 2 23:15:37 2014
Checksum : fb20d8a - correct
Events : 3925676
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 2
Array State : A.A. ('A' == active, '.' == missing)/dev/sdd3:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : cf9db8fa:0c2bb553:46865912:704cceae
Name : runts:0 (local to host runts)
Creation Time : Mon Jul 25 23:27:39 2011
Raid Level : raid5
Raid Devices : 4
Avail Dev Size : 3905591472 (1862.33 GiB 1999.66 GB)
Array Size : 5858385408 (5586.99 GiB 5998.99 GB)
Used Dev Size : 3905590272 (1862.33 GiB 1999.66 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : clean
Device UUID : 4156ab46:bd42c10d:8565d5af:74856641
Update Time : Tue Dec 2 23:14:03 2014
Checksum : a126853f - correct
Events : 3925672
Layout : left-symmetric
Chunk Size : 512K
Device Role : Active device 1
Array State : AAAA ('A' == active, '.' == missing)
At least you have the previous data anyway, which should allow
reconstruction of the array. The device names have changed between your
two reports though, so I'd advise double-checking which is which before
proceeding.
The reports indicate that the original array order (based on the device
role field) for the four devices was (using device UUIDs as they're
consistent):
92589cc2:9d5ed86c:1467efc2:2e6b7f09
4156ab46:bd42c10d:8565d5af:74856641
390bd4a2:07a28c01:528ed41e:a9d0fcf0
b2bf0462:e0722254:0e233a72:aa5df4da
That would give a current device order of sdd3,sda3,sdc3,sdb3 (I don't
have the current data for sda3, but that's the only missing UUID).
The create command would therefore be:
mdadm -C -l 5 -n 4 -c 512 -e 1.2 -z 1952795136 \
/dev/md0 /dev/sdd3 /dev/sda3 /dev/sdc3 missing
mdadm 3.2.3 should use a data offset of 2048, the same as your old
array, but you may want to double-check that with a test array on a
couple of loopback devices first. If not, you'll need to grab the
latest release and add the --data-offset=2048 parameter to the above
create command.
You should also follow the instructions for using overlay files at
https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID
in order to safely test out the above without risking damage to the
array data.
Once you've run the create, run a "fsck -n" on the filesystem to check
that the data looks okay. If not, the order or parameters may be
incorrect - check the --examine output for any differences from the
original results.
Cheers,
Robin
--
___
( ' } | Robin Hill [off-list ref] |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" | Attachments
- signature.asc [application/pgp-signature] 181 bytes