Thread (10 messages) 10 messages, 3 authors, 2009-12-17

Re: reconstruct raid superblock

From: Majed B. <hidden>
Date: 2009-12-17 18:07:34

Before you start rebuilding a new array, I suggest you install the
smartmontools package and run smartctl -a /dev/sdx (on each disk) and
make sure that there are no errors reported.

You might fall into problems if your disks have bad sectors on them.

If your disks don't have any test logs from before, you should run a
long or offline test to make sure they're fully tested:
smartctl -t offline /dev/sdx

And you should configure smartd to monitor and run tests periodically.

On Thu, Dec 17, 2009 at 7:17 PM, Carl Karsten [off-list ref] wrote:
On Thu, Dec 17, 2009 at 9:40 AM, Majed B. [off-list ref] wrote:
quoted
I'm assuming you ran the command with the 2 external disks added to the array.
One question before proceeding: When you removed these 2 externals,
were there any changes on the array? Did you add/delete/modify any
files or rename them?
shutdown the box, unplugged drives, booted box.
quoted
What do you mean the 2 externals have had mkfs run on them? Is this
AFTER you removed the disks from the array? If so, they're useless
now.
That's what I figured.
quoted
The names of the disks have changed and their names in the superblock
are different than what udev is reporting them:
sde now was named sdg
sdf is sdf
sdb is sdb
sdc is sdc
sdd is sdd

According to the listing above, you have superblock info on: sdb, sdc,
sdd, sde, sdf; 5 disks out of 7 -- one of which is a spare.
sdb was a spare and according to other disks' info, it didn't resync
so it has no useful data to aid in recovery.
So you're left with 4 out of 6 disks + 1 spare.

You have a chance of running the array in degraded mode using sde,
sdc, sdd, sdf, assuming these disks are sane.

Try running this command: mdadm -Af /dev/md0 /dev/sde /dev/sdc /dev/sdd /dev/sdf
mdadm: forcing event count in /dev/sdf(1) from 97276 upto 580158
mdadm: /dev/md0 has been started with 4 drives (out of 6).

quoted
then check: cat /proc/mdstat
root@dhcp128:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
md0 : active raid6 sdf[1] sde[5] sdd[3] sdc[2]
     5860549632 blocks level 6, 64k chunk, algorithm 2 [6/4] [_UUU_U]

unused devices: <none>
quoted
If the remaining disks are sane, it should run the array in degraded
mode. Hopefully.
dmesg
[31828.093953] md: md0 stopped.
[31838.929607] md: bind<sdc>
[31838.931455] md: bind<sdd>
[31838.932073] md: bind<sde>
[31838.932376] md: bind<sdf>
[31838.973346] raid5: device sdf operational as raid disk 1
[31838.973349] raid5: device sde operational as raid disk 5
[31838.973351] raid5: device sdd operational as raid disk 3
[31838.973353] raid5: device sdc operational as raid disk 2
[31838.973787] raid5: allocated 6307kB for md0
[31838.974165] raid5: raid level 6 set md0 active with 4 out of 6
devices, algorithm 2
[31839.066014] RAID5 conf printout:
[31839.066016]  --- rd:6 wd:4
[31839.066018]  disk 1, o:1, dev:sdf
[31839.066020]  disk 2, o:1, dev:sdc
[31839.066022]  disk 3, o:1, dev:sdd
[31839.066024]  disk 5, o:1, dev:sde
[31839.066066] md0: detected capacity change from 0 to 6001202823168
[31839.066188]  md0: p1

root@dhcp128:/media# fdisk -l /dev/md0
Disk /dev/md0: 6001.2 GB, 6001202823168 bytes
255 heads, 63 sectors/track, 729604 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x96af0591
   Device Boot      Start         End      Blocks   Id  System
/dev/md0p1               1      182401  1465136001   83  Linux

and now the bad news:
mount /dev/md0p1 md0p1
mount: wrong fs type, bad option, bad superblock on /dev/md0p1

[32359.038796] raid5: Disk failure on sde, disabling device.
[32359.038797] raid5: Operation continuing on 3 devices.
quoted
If that doesn't work, I'd say you're better off scrapping & restoring
your data back onto a new array rather than waste more time fiddling
with superblocks.
Yep.  starting that now.

This is exactly what I was expecting - very few things to try (like 1)
and a very clear pass/fail test.

Thanks for helping me get though this.

quoted
On Thu, Dec 17, 2009 at 6:06 PM, Carl Karsten [off-list ref] wrote:
quoted
I brought back the 2 externals, which have had mkfs run on them, but
maybe the extra superblocks will help (doubt it, but couldn't hurt)

root@dhcp128:/media# mdadm -E /dev/sd[a-z]
mdadm: No md superblock detected on /dev/sda.
/dev/sdb:
         Magic : a92b4efc
       Version : 00.90.00
          UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b
 Creation Time : Wed Mar 25 21:04:08 2009
    Raid Level : raid6
 Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB)
    Array Size : 5860549632 (5589.06 GiB 6001.20 GB)
  Raid Devices : 6
 Total Devices : 6
Preferred Minor : 0

   Update Time : Tue Mar 31 23:08:02 2009
         State : clean
 Active Devices : 5
Working Devices : 6
 Failed Devices : 1
 Spare Devices : 1
      Checksum : a4fbb93a - correct
        Events : 8430

    Chunk Size : 64K

     Number   Major   Minor   RaidDevice State
this     6       8       16        6      spare   /dev/sdb

  0     0       8        0        0      active sync   /dev/sda
  1     1       8       64        1      active sync   /dev/sde
  2     2       8       32        2      active sync   /dev/sdc
  3     3       8       48        3      active sync   /dev/sdd
  4     4       0        0        4      faulty removed
  5     5       8       80        5      active sync   /dev/sdf
  6     6       8       16        6      spare   /dev/sdb
/dev/sdc:
         Magic : a92b4efc
       Version : 00.90.00
          UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b
 Creation Time : Wed Mar 25 21:04:08 2009
    Raid Level : raid6
 Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB)
    Array Size : 5860549632 (5589.06 GiB 6001.20 GB)
  Raid Devices : 6
 Total Devices : 4
Preferred Minor : 0

   Update Time : Sun Jul 12 11:31:47 2009
         State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 2
 Spare Devices : 0
      Checksum : a59452db - correct
        Events : 580158

    Chunk Size : 64K

     Number   Major   Minor   RaidDevice State
this     2       8       32        2      active sync   /dev/sdc

  0     0       8        0        0      active sync   /dev/sda
  1     1       0        0        1      faulty removed
  2     2       8       32        2      active sync   /dev/sdc
  3     3       8       48        3      active sync   /dev/sdd
  4     4       0        0        4      faulty removed
  5     5       8       96        5      active sync   /dev/sdg
/dev/sdd:
         Magic : a92b4efc
       Version : 00.90.00
          UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b
 Creation Time : Wed Mar 25 21:04:08 2009
    Raid Level : raid6
 Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB)
    Array Size : 5860549632 (5589.06 GiB 6001.20 GB)
  Raid Devices : 6
 Total Devices : 4
Preferred Minor : 0

   Update Time : Sun Jul 12 11:31:47 2009
         State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 2
 Spare Devices : 0
      Checksum : a59452ed - correct
        Events : 580158

    Chunk Size : 64K

     Number   Major   Minor   RaidDevice State
this     3       8       48        3      active sync   /dev/sdd

  0     0       8        0        0      active sync   /dev/sda
  1     1       0        0        1      faulty removed
  2     2       8       32        2      active sync   /dev/sdc
  3     3       8       48        3      active sync   /dev/sdd
  4     4       0        0        4      faulty removed
  5     5       8       96        5      active sync   /dev/sdg
/dev/sde:
         Magic : a92b4efc
       Version : 00.90.00
          UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b
 Creation Time : Wed Mar 25 21:04:08 2009
    Raid Level : raid6
 Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB)
    Array Size : 5860549632 (5589.06 GiB 6001.20 GB)
  Raid Devices : 6
 Total Devices : 4
Preferred Minor : 0

   Update Time : Sun Jul 12 11:31:47 2009
         State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 2
 Spare Devices : 0
      Checksum : a5945321 - correct
        Events : 580158

    Chunk Size : 64K

     Number   Major   Minor   RaidDevice State
this     5       8       96        5      active sync   /dev/sdg

  0     0       8        0        0      active sync   /dev/sda
  1     1       0        0        1      faulty removed
  2     2       8       32        2      active sync   /dev/sdc
  3     3       8       48        3      active sync   /dev/sdd
  4     4       0        0        4      faulty removed
  5     5       8       96        5      active sync   /dev/sdg
/dev/sdf:
         Magic : a92b4efc
       Version : 00.90.00
          UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b
 Creation Time : Wed Mar 25 21:04:08 2009
    Raid Level : raid6
 Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB)
    Array Size : 5860549632 (5589.06 GiB 6001.20 GB)
  Raid Devices : 6
 Total Devices : 5
Preferred Minor : 0

   Update Time : Wed Apr  8 11:13:32 2009
         State : clean
 Active Devices : 5
Working Devices : 5
 Failed Devices : 1
 Spare Devices : 0
      Checksum : a5085415 - correct
        Events : 97276

    Chunk Size : 64K

     Number   Major   Minor   RaidDevice State
this     1       8       80        1      active sync   /dev/sdf

  0     0       8        0        0      active sync   /dev/sda
  1     1       8       80        1      active sync   /dev/sdf
  2     2       8       32        2      active sync   /dev/sdc
  3     3       8       48        3      active sync   /dev/sdd
  4     4       0        0        4      faulty removed
  5     5       8       96        5      active sync   /dev/sdg
mdadm: No md superblock detected on /dev/sdg.



On Thu, Dec 17, 2009 at 8:39 AM, Majed B. [off-list ref] wrote:
quoted
You can't copy and change bytes to identify disks.

To check which disks belong to an array, do this:
mdadm -E /dev/sd[a-z]

The disks that you get info from belong to the existing array(s).

In the first email you sent you included an examine output for one of
the disks that listed another disk as a spare (sdb). The output of
examine should shed more light.

On Thu, Dec 17, 2009 at 5:15 PM, Carl Karsten [off-list ref] wrote:
quoted
On Thu, Dec 17, 2009 at 4:35 AM, Majed B. [off-list ref] wrote:
quoted
I have misread the information you've provided, so allow me to correct myself:

You're running a RAID6 array, with 2 disks lost/failed. Any disk loss
after that will cause data loss since you have no redundancy (2 disks
died).
right - but I am not sure if data loss has occurred, where data is the
data being stored on the raid, not the raid metadata.

My guess is I need to copy the raid superblock from one of the other
disks (say sdb), find the byets that identify the disk and change from
sdb to sda.
quoted
I believe it's still possible to reassemble the array, but you only
need to remove the MBR. See this page for information:
http://www.cyberciti.biz/faq/linux-how-to-uninstall-grub/
dd if=/dev/null of=/dev/sdX bs=446 count=1

Before proceeding, provide the output of cat /proc/mdstat
root@dhcp128:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
[raid4] [raid10]
unused devices: <none>

quoted
Is the array currently running degraded or is it suspended?
um, not running, not sure I would call it suspended.
quoted
What happened to the spare disk assigned?
I don't understand.
quoted
Did it finish resyncing
before you installed grub on the wrong disk?
I think so.

I am fairly sure I could assemble the array before I installed grub.
quoted
On Thu, Dec 17, 2009 at 8:21 AM, Majed B. [off-list ref] wrote:
quoted
If your other disks are sane and you are able to run a degraded array,  then
you can remove grub using dd then re-add the disk to the array.

To clear the first 1MB of the disk:
dd if=/dev/zero of=/dev/sdx bs=1M count=1
Replace sdx with the disk name that has grub.

On Dec 17, 2009 6:53 AM, "Carl Karsten" [off-list ref] wrote:

I took over a box that had 1 ide boot drive, 6 sata raid drives (4
internal, 2 external.)  I believe the 2 externals were redundant, so
could be removed.  so I did, and mkfs-ed them.  then I installed
ubuntu to the ide, and installed grub to sda, which turns out to be
the first sata.  which would be fine if the raid was on sda1, but it
is on sda, and now the raid wont' assemble.  no surprise, and I do
have a backup of the data spread across 5 external drives.  but before
I  abandon the array, I am wondering if I can fix it by recreating
mdadm's metatdata on sda, given I have sd[bcd] to work with.

any suggestions?

root@dhcp128:~# mdadm --examine /dev/sd[abcd]
mdadm: No md superblock detected on /dev/sda.
/dev/sdb:
         Magic : a92b4efc
       Version : 00.90.00
          UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b
 Creation Time : Wed Mar 25 21:04:08 2009
    Raid Level : raid6
 Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB)
    Array Size : 5860549632 (5589.06 GiB 6001.20 GB)
  Raid Devices : 6
 Total Devices : 6
Preferred Minor : 0

   Update Time : Tue Mar 31 23:08:02 2009
         State : clean
 Active Devices : 5
Working Devices : 6
 Failed Devices : 1
 Spare Devices : 1
      Checksum : a4fbb93a - correct
        Events : 8430

    Chunk Size : 64K

     Number   Major   Minor   RaidDevice State
this     6       8       16        6      spare   /dev/sdb

  0     0       8        0        0      active sync   /dev/sda
  1     1       8       64        1      active sync   /dev/sde
  2     2       8       32        2      active sync   /dev/sdc
  3     3       8       48        3      active sync   /dev/sdd
  4     4       0        0        4      faulty removed
  5     5       8       80        5      active sync
  6     6       8       16        6      spare   /dev/sdb
/dev/sdc:
         Magic : a92b4efc
       Version : 00.90.00
          UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b
 Creation Time : Wed Mar 25 21:04:08 2009
    Raid Level : raid6
 Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB)
    Array Size : 5860549632 (5589.06 GiB 6001.20 GB)
  Raid Devices : 6
 Total Devices : 4
Preferred Minor : 0

   Update Time : Sun Jul 12 11:31:47 2009
         State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 2
 Spare Devices : 0
      Checksum : a59452db - correct
        Events : 580158

    Chunk Size : 64K

     Number   Major   Minor   RaidDevice State
this     2       8       32        2      active sync   /dev/sdc

  0     0       8        0        0      active sync   /dev/sda
  1     1       0        0        1      faulty removed
  2     2       8       32        2      active sync   /dev/sdc
  3     3       8       48        3      active sync   /dev/sdd
  4     4       0        0        4      faulty removed
  5     5       8       96        5      active sync
/dev/sdd:
         Magic : a92b4efc
       Version : 00.90.00
          UUID : 8d0cf436:3fc2d2ef:93d71b24:b036cc6b
 Creation Time : Wed Mar 25 21:04:08 2009
    Raid Level : raid6
 Used Dev Size : 1465137408 (1397.26 GiB 1500.30 GB)
    Array Size : 5860549632 (5589.06 GiB 6001.20 GB)
  Raid Devices : 6
 Total Devices : 4
Preferred Minor : 0

   Update Time : Sun Jul 12 11:31:47 2009
         State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 2
 Spare Devices : 0
      Checksum : a59452ed - correct
        Events : 580158

    Chunk Size : 64K

     Number   Major   Minor   RaidDevice State
this     3       8       48        3      active sync   /dev/sdd

  0     0       8        0        0      active sync   /dev/sda
  1     1       0        0        1      faulty removed
  2     2       8       32        2      active sync   /dev/sdc
  3     3       8       48        3      active sync   /dev/sdd
  4     4       0        0        4      faulty removed
  5     5       8       96        5      active sync

--
Carl K
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
      Majed B.


--
Carl K


--
      Majed B.


--
Carl K


--
      Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
Carl K


-- 
       Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help