Re: [PATCH] md/raid1: properly indicate failure when ending a failed write request

From: Song Liu <song@kernel.org>
Date: 2021-04-22 05:59:06

On Wed, Apr 21, 2021 at 10:38 AM Paul Clements
[off-list ref] wrote:

On Tue, Apr 20, 2021, 7:49 PM Song Liu [off-list ref] wrote:

quoted

On Tue, Apr 20, 2021 at 3:05 PM Paul Clements [off-list ref] wrote:

quoted

This patch addresses a data corruption bug in raid1 arrays using bitmaps.
Without this fix, the bitmap bits for the failed I/O end up being cleared.

I think this only happens when we re-add a faulty drive?

Yes, the bitmap gets cleared when the disk is marked faulty or a write
error occurs. Then when the disk is re-added, the bitmap-based resync
is, of course, not accurate.

Is there another way to deal with a transient, transport-based error,
other than this?

For instance, I'm using nbd as one of the mirror legs. In that case,
assuming the failures that lead to the device being marked faulty are
just transport/network issues, then we want the resync to be able to
correctly deal with this. It has always worked this way since a long
time ago. There was a fairly recent commit
(eeba6809d8d58908b5ed1b5ceb5fcb09a98a7cad) that re-arranged the code
(previously all write failures were retried via flagging with
R1BIO_WriteError).

So I guess we need "Fixes eeba6809d8d589"?

CC Yufen, who authored the above patch.

Does the patch present a problem in some other scenario?

I don't think this presents any problem.

Applied to md-next. (so no need to resend for the Fix tag).

Thanks,
Song

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help