Thread (14 messages) 14 messages, 3 authors, 2013-10-18

Re: Problem diagnosing rebuilding raid5 array

From: NeilBrown <hidden>
Date: 2013-10-16 06:11:57

On Mon, 14 Oct 2013 12:31:04 -0400 peter@steinhoff.se wrote:
Hi!

I'm having some problems with a raid 5 array and I'm not sure how to  
diagnose the problem and how to proceed so I figured I need to ask the  
experts :-)

I actually suspect I may have several problems at the same time.

The machine has two raid arrays, one raid 1 (md0) and one raid 5  
(md1). The raid 5 array consists of 5 x 2TB WD RE4-GP drives.

I found some read errors in the log on /dev/sdh so I replaced it with  
a new RE4 GP drive and did mdadm --add /dev/md1 /dev/sdh.

The array was rebuilding and I left it for the night.

In the morning cat /proc/mdstat showed that 2 drives where down. I may  
remember incorrectly but I think that /dev/sdh showed up as a spare  
and another drive showed fail but the array showed up as active.

Anyway, I'm not sure which drive showed fail but I disconnected the  
system for more diagnosis. This was a couple of days ago.

I found that the CPU fan had stopped working and replaced it. The case  
have several fans and the heatsink seemed cool even without the fan  
(it's an i3-530 that does nothing more than samba so it's mostly  
idle). Possibly the hardrives has been running hotter than normal for  
a while though.

Anyway, now when I reboot I get this:
quoted
cat /proc/mdstat
Personalities : [raid1]
md1 : inactive sdd[1](S) sdh[5](S) sdg[4](S) sdf[2](S) sde[0](S)
       9767572480 blocks

md0 : active raid1 sda[0] sdb[1]
       1953514496 blocks [2/2] [UU]

unused devices: <none>


I'm not sure what is happening and what my next step is. I would  
appreciate any help on this so I don't screw up the system more than  
it already is :-)
We have no way of knowing how far recovery progressed onto sdh, so you need
to exclude it.  With v1.x metadata we would know ... but it wouldn't really
help the much.

Your only option is to do a --force assemble of the other devices.
sde is a little bit out of date, but it cannot be much out of date as the
array would have stopped handling writes as soon as it failed.

This will assemble the array degraded.  You should then 'fsck' and do
anything else to check that the data is OK.

Then you need to check that all your drives and are your system are good (if
you haven't already), then add a good drive as a spare and let it rebuild.

NeilBrown

Below is the ouput of "mdadm --examine" for the drives in the raid 5 array.

BTW, don't know if it matters but the system is running an older  
debian (lenny?) with a 2.6.32 backport kernel, mdadm version is 2.6.7.2.

Best Regards,
Peter

quoted
mdadm --examine /dev/sd?
/dev/sdd:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : 61a6a879:adb7ac7b:86c7b55e:eb5cc2b6
   Creation Time : Thu Jun 24 15:12:41 2010
      Raid Level : raid5
   Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
      Array Size : 7814057984 (7452.07 GiB 8001.60 GB)
    Raid Devices : 5
   Total Devices : 5
Preferred Minor : 1

     Update Time : Wed Oct  9 20:29:41 2013
           State : clean
  Active Devices : 3
Working Devices : 4
  Failed Devices : 1
   Spare Devices : 1
        Checksum : 3dc0af1a - correct
          Events : 1288444

          Layout : left-symmetric
      Chunk Size : 128K

       Number   Major   Minor   RaidDevice State
this     1       8       48        1      active sync   /dev/sdd

    0     0       0        0        0      removed
    1     1       8       48        1      active sync   /dev/sdd
    2     2       8       80        2      active sync   /dev/sdf
    3     3       0        0        3      faulty removed
    4     4       8       96        4      active sync   /dev/sdg
    5     5       8      112        5      spare   /dev/sdh


/dev/sde:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : 61a6a879:adb7ac7b:86c7b55e:eb5cc2b6
   Creation Time : Thu Jun 24 15:12:41 2010
      Raid Level : raid5
   Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
      Array Size : 7814057984 (7452.07 GiB 8001.60 GB)
    Raid Devices : 5
   Total Devices : 5
Preferred Minor : 1

     Update Time : Tue Oct  8 03:26:05 2013
           State : clean
  Active Devices : 4
Working Devices : 5
  Failed Devices : 1
   Spare Devices : 1
        Checksum : 3dbe6d93 - correct
          Events : 1288428

          Layout : left-symmetric
      Chunk Size : 128K

       Number   Major   Minor   RaidDevice State
this     0       8       64        0      active sync   /dev/sde

    0     0       8       64        0      active sync   /dev/sde
    1     1       8       48        1      active sync   /dev/sdd
    2     2       8       80        2      active sync   /dev/sdf
    3     3       0        0        3      faulty removed
    4     4       8       96        4      active sync   /dev/sdg
    5     5       8      112        5      spare   /dev/sdh


/dev/sdf:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : 61a6a879:adb7ac7b:86c7b55e:eb5cc2b6
   Creation Time : Thu Jun 24 15:12:41 2010
      Raid Level : raid5
   Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
      Array Size : 7814057984 (7452.07 GiB 8001.60 GB)
    Raid Devices : 5
   Total Devices : 5
Preferred Minor : 1

     Update Time : Wed Oct  9 20:29:41 2013
           State : clean
  Active Devices : 3
Working Devices : 4
  Failed Devices : 1
   Spare Devices : 1
        Checksum : 3dc0af3c - correct
          Events : 1288444

          Layout : left-symmetric
      Chunk Size : 128K

       Number   Major   Minor   RaidDevice State
this     2       8       80        2      active sync   /dev/sdf

    0     0       0        0        0      removed
    1     1       8       48        1      active sync   /dev/sdd
    2     2       8       80        2      active sync   /dev/sdf
    3     3       0        0        3      faulty removed
    4     4       8       96        4      active sync   /dev/sdg
    5     5       8      112        5      spare   /dev/sdh


/dev/sdg:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : 61a6a879:adb7ac7b:86c7b55e:eb5cc2b6
   Creation Time : Thu Jun 24 15:12:41 2010
      Raid Level : raid5
   Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
      Array Size : 7814057984 (7452.07 GiB 8001.60 GB)
    Raid Devices : 5
   Total Devices : 5
Preferred Minor : 1

     Update Time : Wed Oct  9 20:29:41 2013
           State : clean
  Active Devices : 3
Working Devices : 4
  Failed Devices : 1
   Spare Devices : 1
        Checksum : 3dc0af50 - correct
          Events : 1288444

          Layout : left-symmetric
      Chunk Size : 128K

       Number   Major   Minor   RaidDevice State
this     4       8       96        4      active sync   /dev/sdg

    0     0       0        0        0      removed
    1     1       8       48        1      active sync   /dev/sdd
    2     2       8       80        2      active sync   /dev/sdf
    3     3       0        0        3      faulty removed
    4     4       8       96        4      active sync   /dev/sdg
    5     5       8      112        5      spare   /dev/sdh


/dev/sdh:
           Magic : a92b4efc
         Version : 00.90.00
            UUID : 61a6a879:adb7ac7b:86c7b55e:eb5cc2b6
   Creation Time : Thu Jun 24 15:12:41 2010
      Raid Level : raid5
   Used Dev Size : 1953514496 (1863.02 GiB 2000.40 GB)
      Array Size : 7814057984 (7452.07 GiB 8001.60 GB)
    Raid Devices : 5
   Total Devices : 5
Preferred Minor : 1

     Update Time : Wed Oct  9 20:29:41 2013
           State : clean
  Active Devices : 3
Working Devices : 4
  Failed Devices : 1
   Spare Devices : 1
        Checksum : 3dc0af5c - correct
          Events : 1288444

          Layout : left-symmetric
      Chunk Size : 128K

       Number   Major   Minor   RaidDevice State
this     5       8      112        5      spare   /dev/sdh

    0     0       0        0        0      removed
    1     1       8       48        1      active sync   /dev/sdd
    2     2       8       80        2      active sync   /dev/sdf
    3     3       0        0        3      faulty removed
    4     4       8       96        4      active sync   /dev/sdg
    5     5       8      112        5      spare   /dev/sdh


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
  

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help