Thread (13 messages) 13 messages, 5 authors, 2014-12-04

Re: Raid5 drive fail during grow and no backup

From: Phil Turmel <hidden>
Date: 2014-12-04 20:02:51

Hi Phillip,

On 12/04/2014 02:29 PM, Phillip Susi wrote:
On 11/7/2014 10:36 PM, Phil Turmel wrote:
quoted
However, if the device with the bad sector is trying to recover
longer than the linux low level driver's timeout, bad things^TM
happen. Specifically, the driver resets the SATA (or SCSI)
connection and attempts to reconnect.  During this brief time, it
will not accept further I/O, so the write back of the reconstructed
data fails.  Then the device has experienced a *write* error, so MD
fails the drive.  This is the out-of-the-box behavior of
consumer-grade drives in raid arrays.
What?  During the recovery action ( reset and retry ), a write being
issued to the drive should just sit in the request queue until after
the drive finishes being reset; it should not just be failed outright.
It's been a few years since I've directly tested this myself, but that's
what would happen.  The window to reject the write might be small, but
it's there (unless the fix is recent).

I'm not an expert on the driver stack, though.  YMMV.

Phil
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help