Thread (13 messages) 13 messages, 2 authors, 2013-09-16

Re: Advice for recovering array containing LUKS encrypted LVM volumes

From: P Orrifolius <hidden>
Date: 2013-08-06 01:54:05

Thanks for your response...

On 5 August 2013 01:09, Stan Hoeppner [off-list ref] wrote:
On 8/4/2013 12:49 AM, P Orrifolius wrote:
quoted
I have an 8 device RAID6.  There are 4 drives on each of two
controllers and it looks like one of the controllers failed
temporarily.
Are you certain the fault was caused by HBA?  Hardware doesn't tend to
fail temporarily.  It does often fail intermittently, before complete
failure.  If you're certain it's the HBA you should replace it before
attempting to bring the array back up.

Do you have 2 SFF8087 cables connected to two backplanes, or do you have
8 discrete SATA cables connected directly to the 8 drives?  WRT the set
of 4 drives that dropped, do these four share a common power cable to
the PSU that is not shared by the other 4 drives?
The full setup, an el-cheapo rig used for media, backups etc at home, is:

8x2TB SATA drives, split across two Vantec NexStar HX4 enclosures.
These separately powered enclosures have a single USB3 plug and a
single eSATA plug.  The documentation states that a "Port Multiplier
Is Required For eSATA".

The original intention was to connect them via eSATA directly to my
motherboard.  Subsequently I determined that my motherboard only
supports command-based not FIS.  I had a look for a FIS
port-multiplier card but USB3 (which my motherboard doesn't support)
controllers seemed about a 1/4 the price so I thought I'd try that
out.  lsusb tells me that there are JMicron USB3-to-ATA bridges in the
enclosures.

So, each enclosure is actually connected by a single USB3 connection
to one of two ports on a single controller.


Logs show that all 4 drives connected to one of the ports were reset
by the XHCI driver (more or less simultaneously) losing the drives and
failing the array.  In the original failure they were back with the
same /dev/sd? in a few minutes, but I guess the Event count had
diverged already.

Perhaps that suggests the enclosure bridge is at fault, unless an
individual port on the controller freaked out.  Definitely not a power
failure, could be a USB3 cable issue I guess.
The point of these
questions is to make sure you know the source of the problem before
proceeding.  It could be the HBA, but it could also be a
power/cable/connection problem, a data/cable/connection problem, or a
failed backplane.  Cheap backplanes, i.e. cheap hotswap drive cages
often cause such intermittent problems as you've described here.
Truth is the USB3 has been a bit of a pain anyway... the enclosure
bridge seems to prevent direct fdisk'ing and SMART at least.  My
biggest concern was that it spits out copious 'needs
XHCI_TRUST_TX_LENGTH quirk?' warnings.
But I burned it in with a few weeks of read/write/validate work
without any apparent negative consequence and it's been fine for about
a year of uptime under light-moderate workload.  My trust was perhaps
misplaced.
quoted
What is the best/safest way to try and get the array up and working
again?  Should I just work through
https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID
Again, get the hardware straightened out first or you'll continue to
have problems.
It seems I'd probably be better of going to eSATA... any
recommendations on port multipying controllers?

Is the Highpoint RocketRAID 622 ok?  More expensive than I'd like but
one of the few options that doesn't involve waiting on international
shipping.
Once that's accomplished, skip to the "Force assembly" section in the
guide you referenced.  You can ignore the preceding $OVERLAYS and disk
copying steps because you know the problem wasn't/isn't the disks.
Simply force assembly.
Good news is I worked through the recovery instructions, including
setting up the overlays (due to an excess of paranoia), and I was able
to mount each XFS filesystem and get a seemingly good result from
xfs_repair -n.

Haven't managed to get my additional backups up to date yet due to USB
reset happening again whilst trying but I presume the data will be
ok... once I can get to it.

Thanks.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help