Re: RAID6 growing interrupted, array won't assemble or resume growing

From: Nic Wolfe <hidden>
Date: 2013-06-07 04:15:22

My original post had an error - sda is my boot drive, it's not part of
the array. The 6th drive is as follows:

/dev/sdg:
          Magic : a92b4efc
        Version : 00.91.00
           UUID : 4672ced4:81401dbc:52723fc8:3fe02f5a (local to host MidgetNAS)
  Creation Time : Wed Jun  2 21:11:18 2010
     Raid Level : raid6
  Used Dev Size : 1953431488 (1862.94 GiB 2000.31 GB)
     Array Size : 7813725952 (7451.75 GiB 8001.26 GB)
   Raid Devices : 6
  Total Devices : 6
Preferred Minor : 1

  Reshape pos'n : 677888 (662.11 MiB 694.16 MB)
  Delta Devices : 1 (5->6)

    Update Time : Mon Oct 22 21:08:29 2012
          State : clean
 Active Devices : 1
Working Devices : 1
 Failed Devices : 4
  Spare Devices : 0
       Checksum : 146beaa7 - correct
         Events : 0.1323362

     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     5      65        0        5      active sync

   0     0       0        0        0      removed
   1     1       0        0        1      faulty removed
   2     2       0        0        2      faulty removed
   3     3       0        0        3      faulty removed
   4     4       0        0        4      faulty removed
   5     5      65        0        5      active sync

Your reshape is barely started.  Presumably you specified a --backup
clause in the original --grow command.  You will need that file.

Embarrassingly I have no backup file from the grow operation.

Modern mdadm should be able to force assemble this and continue without
problems.  Rather than operate within a questionable environment, I
would strongly encourage you to perform the forced assembly with a
recent live cd.  I personally use "SystemRescueCD", and I know it has
the appropriate kernel support and tools.

But.  You need to share more information about your hardware problems.
Dmesg, etc.  There are commonly-encountered configuration problems that
appear to be mysterious drive failures.  If you know all about error
recovery control, please elaborate.  Otherwise, please share the output
of "smartctl -x /dev/sdX" for all of your member devices.

My drives are connected to the machine through a poorly supported old
RAID card (rr2522) which required me to build the driver into my
kernel, so I don't think a live cd will work. I don't have enough SATA
slots in the machine to connect them all without it.

If it isn't obvious by now, I definitely don't know anything about
error recovery control.

Since it's a RAID card and not just a hba each drive is presented to
the OS as a single drive JBOD array which means the OS doesn't see any
SMART info. I can see (at least some) SMART information through my
RAID card admin console and it claims they're all fine.

I am not having hardware problems at the moment, I only encountered
them when I had 16 drives running through the RAID card (I have
another 10 drive array). With the other array disconnected the card
seems to be behaving - there's nothing suspicious that I can see in
dmesg.

In the meantime I will see if I can put together a machine with 6 SATA
ports and attempt to hook the drives up directly rather than through
the RAID card so I can use a live CD and get the SMART information for
you.

Thanks for the reply,

Nic

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help