Thread (13 messages) 13 messages, 4 authors, 2013-05-10

Re: Help with recovering a RAID5 array

From: Mathias Burén <hidden>
Date: 2013-05-02 12:30:22

On 2 May 2013 13:24, Stefan Borggraefe [off-list ref] wrote:
Hi,

I am using a RAID5 software RAID on Ubuntu 12.04 (kernel
3.2.0-37-generic x86_64).

It consits of 6 Hitachi drives with 4 TB and contains an ext 4 file system.
There are no spare devices.

Yesterday evening I exchanged a drive that showed SMART errors and the
array started rebuilding its redundancy normally.

When I returned to this server this morning, the array was in the following
state:

md126 : active raid5 sdc1[7](S) sdh1[4] sdd1[3](F) sde1[0] sdg1[6] sdf1[2]
      19535086080 blocks super 1.2 level 5, 512k chunk, algorithm 2 [6/4]
[U_U_UU]

sdc is the newly added hard disk, but now also sdd failed. :( It would be
great if there was a way to have the this RAID5 working again. Perhaps sdc1
can then be fully added to the array and after this drive sdd also exchanged.

I have not started experimenting or changing this array in any way, but wanted
to ask here for assistance first. Thank you for your help!

mdadm --examine /dev/sd[cdegfh]1 | egrep 'Event|/dev/sd'

shows

/dev/sdc1:
         Events : 494
/dev/sdd1:
         Events : 478
/dev/sde1:
         Events : 494
/dev/sdf1:
         Events : 494
/dev/sdg1:
         Events : 494
/dev/sdh1:
         Events : 494



mdadm --examine /dev/sd[cdegfh]1

showsThank you for your help! :)

/dev/sdc1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 13051471:fba5785f:4365dea1:0670be37
           Name : teraturm:2  (local to host teraturm)
  Creation Time : Tue Feb  5 14:23:06 2013
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
     Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
  Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 7433213e:0dd2e5ed:073dd59d:bf1f83d8

    Update Time : Tue Apr 30 10:06:55 2013
       Checksum : 9e83f72 - correct
         Events : 494

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : spare
   Array State : A.A.AA ('A' == active, '.' == missing)
/dev/sdd1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 13051471:fba5785f:4365dea1:0670be37
           Name : teraturm:2  (local to host teraturm)
  Creation Time : Tue Feb  5 14:23:06 2013
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
     Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
  Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : active
    Device UUID : c2e5423f:6d91a061:c3f55aa7:6d1cec87

    Update Time : Mon Apr 29 17:24:26 2013
       Checksum : 37b97776 - correct
         Events : 478

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 3
   Array State : AAAAAA ('A' == active, '.' == missing)
/dev/sde1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 13051471:fba5785f:4365dea1:0670be37
           Name : teraturm:2  (local to host teraturm)
  Creation Time : Tue Feb  5 14:23:06 2013
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
     Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
  Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 68207885:02c05297:8ef62633:65b83839

    Update Time : Tue Apr 30 10:06:55 2013
       Checksum : f0b36c7f - correct
         Events : 494

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 0
   Array State : A.A.AA ('A' == active, '.' == missing)
/dev/sdf1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 13051471:fba5785f:4365dea1:0670be37
           Name : teraturm:2  (local to host teraturm)
  Creation Time : Tue Feb  5 14:23:06 2013
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
     Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
  Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 7d328a98:6c02f550:ab1837c0:cb773ac1

    Update Time : Tue Apr 30 10:06:55 2013
       Checksum : d2799f34 - correct
         Events : 494

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 2
   Array State : A.A.AA ('A' == active, '.' == missing)
/dev/sdg1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 13051471:fba5785f:4365dea1:0670be37
           Name : teraturm:2  (local to host teraturm)
  Creation Time : Tue Feb  5 14:23:06 2013
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
     Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
  Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 76b683b1:58e053ff:57ac0cfc:be114f75

    Update Time : Tue Apr 30 10:06:55 2013
       Checksum : 89bc2e05 - correct
         Events : 494

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 5
   Array State : A.A.AA ('A' == active, '.' == missing)
/dev/sdh1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x0
     Array UUID : 13051471:fba5785f:4365dea1:0670be37
           Name : teraturm:2  (local to host teraturm)
  Creation Time : Tue Feb  5 14:23:06 2013
     Raid Level : raid5
   Raid Devices : 6

 Avail Dev Size : 7814035053 (3726.02 GiB 4000.79 GB)
     Array Size : 19535086080 (18630.11 GiB 20003.93 GB)
  Used Dev Size : 7814034432 (3726.02 GiB 4000.79 GB)
    Data Offset : 2048 sectors
   Super Offset : 8 sectors
          State : clean
    Device UUID : 3c88705f:9f3add0e:d58d46a7:b40d02d7

    Update Time : Tue Apr 30 10:06:55 2013
       Checksum : 541f3913 - correct
         Events : 494

         Layout : left-symmetric
     Chunk Size : 512K

   Device Role : Active device 4
   Array State : A.A.AA ('A' == active, '.' == missing)

This is the dmesg output from when the failure happened:

[6669459.855352] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.855362] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.855368] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 2a 00 00 08
00
[6669459.855387] end_request: I/O error, dev sdd, sector 590910506
[6669459.855456] raid5_end_read_request: 14 callbacks suppressed
[6669459.855463] md/raid:md126: read error not correctable (sector 590910472
on sdd1).
[6669459.855490] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.855496] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.855501] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 32 00 00 08
00
[6669459.855515] end_request: I/O error, dev sdd, sector 590910514
[6669459.855594] md/raid:md126: read error not correctable (sector 590910480
on sdd1).
[6669459.855608] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.855611] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.855620] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 3a 00 00 08
00
[6669459.855648] end_request: I/O error, dev sdd, sector 590910522
[6669459.855710] md/raid:md126: read error not correctable (sector 590910488
on sdd1).
[6669459.855720] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.855723] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.855727] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 42 00 00 08
00
[6669459.855737] end_request: I/O error, dev sdd, sector 590910530
[6669459.855796] md/raid:md126: read error not correctable (sector 590910496
on sdd1).
[6669459.855814] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.855817] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.855821] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 4a 00 00 08
00
[6669459.855831] end_request: I/O error, dev sdd, sector 590910538
[6669459.855889] md/raid:md126: read error not correctable (sector 590910504
on sdd1).
[6669459.855907] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.855910] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.855914] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 52 00 00 08
00
[6669459.855924] end_request: I/O error, dev sdd, sector 590910546
[6669459.855982] md/raid:md126: read error not correctable (sector 590910512
on sdd1).
[6669459.855990] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.855992] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.855996] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 5a 00 00 08
00
[6669459.856004] end_request: I/O error, dev sdd, sector 590910554
[6669459.856062] md/raid:md126: read error not correctable (sector 590910520
on sdd1).
[6669459.856072] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856075] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.856079] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 62 00 00 08
00
[6669459.856088] end_request: I/O error, dev sdd, sector 590910562
[6669459.856153] md/raid:md126: read error not correctable (sector 590910528
on sdd1).
[6669459.856171] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856174] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.856178] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 6a 00 00 08
00
[6669459.856188] end_request: I/O error, dev sdd, sector 590910570
[6669459.856256] md/raid:md126: read error not correctable (sector 590910536
on sdd1).
[6669459.856265] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856268] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.856272] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 72 00 00 08
00
[6669459.856281] end_request: I/O error, dev sdd, sector 590910578
[6669459.856346] md/raid:md126: read error not correctable (sector 590910544
on sdd1).
[6669459.856364] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856368] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.856374] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 7a 00 00 08
00
[6669459.856385] end_request: I/O error, dev sdd, sector 590910586
[6669459.856445] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856449] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.856456] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 82 00 00 08
00
[6669459.856466] end_request: I/O error, dev sdd, sector 590910594
[6669459.856526] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856530] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.856537] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 8a 00 00 08
00
[6669459.856547] end_request: I/O error, dev sdd, sector 590910602
[6669459.856607] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856611] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.856617] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 92 00 00 08
00
[6669459.856628] end_request: I/O error, dev sdd, sector 590910610
[6669459.856687] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856691] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.856697] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 9a 00 00 08
00
[6669459.856707] end_request: I/O error, dev sdd, sector 590910618
[6669459.856767] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856772] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.856778] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 a2 00 00 08
00
[6669459.856788] end_request: I/O error, dev sdd, sector 590910626
[6669459.856847] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856851] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.856859] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 aa 00 00 08
00
[6669459.856869] end_request: I/O error, dev sdd, sector 590910634
[6669459.856928] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.856932] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.856938] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 b2 00 00 08
00
[6669459.856949] end_request: I/O error, dev sdd, sector 590910642
[6669459.857008] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.857011] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.857018] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 ba 00 00 08
00
[6669459.857028] end_request: I/O error, dev sdd, sector 590910650
[6669459.857088] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.857092] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.857098] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 c2 00 00 08
00
[6669459.857109] end_request: I/O error, dev sdd, sector 590910658
[6669459.857168] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.857171] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.857178] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 ca 00 00 08
00
[6669459.857188] end_request: I/O error, dev sdd, sector 590910666
[6669459.857248] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.857251] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.857258] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 d2 00 00 08
00
[6669459.857269] end_request: I/O error, dev sdd, sector 590910674
[6669459.857328] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.857333] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.857339] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 da 00 00 08
00
[6669459.857349] end_request: I/O error, dev sdd, sector 590910682
[6669459.857408] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.857412] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.857418] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 94 e2 00 00 08
00
[6669459.857429] end_request: I/O error, dev sdd, sector 590910690
[6669459.857488] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.857492] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.857499] sd 6:1:10:0: [sdd] CDB: Read(10): 28 00 23 38 93 4a 00 00 08
00
[6669459.857509] end_request: I/O error, dev sdd, sector 590910282
[6669459.857569] sd 6:1:10:0: [sdd] Unhandled error code
[6669459.857573] sd 6:1:10:0: [sdd]  Result: hostbyte=DID_ABORT
driverbyte=DRIVER_OK
[6669459.857579] sd 6:1:10:0: [sdd] CDB:
[6669459.857585] aacraid: Host adapter abort request (6,1,10,0)
[6669459.857639] Read(10): 28 00 23 38 93 42 00 00 08 00
[6669459.857648] end_request: I/O error, dev sdd, sector 590910274
[6669459.857844] aacraid: Host adapter reset request. SCSI hang ?
[6669470.028090] RAID conf printout:
[6669470.028097]  --- level:5 rd:6 wd:4
[6669470.028101]  disk 0, o:1, dev:sde1
[6669470.028105]  disk 1, o:1, dev:sdc1
[6669470.028109]  disk 2, o:1, dev:sdf1
[6669470.028112]  disk 3, o:0, dev:sdd1
[6669470.028115]  disk 4, o:1, dev:sdh1
[6669470.028118]  disk 5, o:1, dev:sdg1
[6669470.034462] RAID conf printout:
[6669470.034464]  --- level:5 rd:6 wd:4
[6669470.034465]  disk 0, o:1, dev:sde1
[6669470.034466]  disk 2, o:1, dev:sdf1
[6669470.034467]  disk 3, o:0, dev:sdd1
[6669470.034468]  disk 4, o:1, dev:sdh1
[6669470.034469]  disk 5, o:1, dev:sdg1
[6669470.034484] RAID conf printout:
[6669470.034486]  --- level:5 rd:6 wd:4
[6669470.034489]  disk 0, o:1, dev:sde1
[6669470.034491]  disk 2, o:1, dev:sdf1
[6669470.034494]  disk 3, o:0, dev:sdd1
[6669470.034496]  disk 4, o:1, dev:sdh1
[6669470.034499]  disk 5, o:1, dev:sdg1
[6669470.034571] RAID conf printout:
[6669470.034577]  --- level:5 rd:6 wd:4
[6669470.034581]  disk 0, o:1, dev:sde1
[6669470.034584]  disk 2, o:1, dev:sdf1
[6669470.034587]  disk 4, o:1, dev:sdh1
[6669470.034589]  disk 5, o:1, dev:sdg1

Please let me know if you need any more information.
--
Best regards,
Stefan Borggraefe
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

I won't scold you for using RAID5 instead of RAID6 with this number of
if drives and especially the size of the drives.

Could you please post the output of smartctl -a for each device? (from
smartmontools)

That way we can verify which HDDs are broken, before proceeding.

Mathias
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help