Re: [RFC PATCH v2 0/9] Block/XFS: Support alternative mirror device retry

[RFC PATCH v2 0/9] Block/XFS: Support alternative mirror device retry · Bob Liu <hidden> · 2019-02-13
[RFC PATCH v2 1/9] block: add nr_mirrors to request_queue · Bob Liu <hidden> · 2019-02-13
Re: [RFC PATCH v2 1/9] block: add nr_mirrors to request_queue · Andreas Dilger <hidden> · 2019-02-13
Re: [RFC PATCH v2 1/9] block: add nr_mirrors to request_queue · "Theodore Y. Ts'o" <tytso@mit.edu> · 2019-02-13
Re: [RFC PATCH v2 1/9] block: add nr_mirrors to request_queue · Bob Liu <hidden> · 2019-02-14
Re: [RFC PATCH v2 1/9] block: add nr_mirrors to request_queue · "Theodore Y. Ts'o" <tytso@mit.edu> · 2019-02-18
[RFC PATCH v2 2/9] block: add rd_hint to bio and request · Bob Liu <hidden> · 2019-02-13
Re: [RFC PATCH v2 2/9] block: add rd_hint to bio and request · Jens Axboe <axboe@kernel.dk> · 2019-02-13
Re: [RFC PATCH v2 2/9] block: add rd_hint to bio and request · Bob Liu <hidden> · 2019-02-14
[RFC PATCH v2 3/9] md:raid1: set mirrors correctly · Bob Liu <hidden> · 2019-02-13
[RFC PATCH v2 4/9] md:raid1: rd_hint support and consider stacked layer case · Bob Liu <hidden> · 2019-02-13
[RFC PATCH v2 5/9] Add b_alt_retry to xfs_buf · Bob Liu <hidden> · 2019-02-13
[RFC PATCH v2 6/9] xfs: Add b_rd_hint to xfs_buf · Bob Liu <hidden> · 2019-02-13
[RFC PATCH v2 7/9] xfs: Add device retry · Bob Liu <hidden> · 2019-02-13
[RFC PATCH v2 9/9] xfs: Add tracepoints and logging to alternate device retry · Bob Liu <hidden> · 2019-02-13
[RFC PATCH v2 8/9] xfs: Rewrite retried read · Bob Liu <hidden> · 2019-02-13
Re: [RFC PATCH v2 0/9] Block/XFS: Support alternative mirror device retry · jianchao.wang <hidden> · 2019-02-18
Re: [RFC PATCH v2 0/9] Block/XFS: Support alternative mirror device retry · jianchao.wang <hidden> · 2019-02-19
Re: [RFC PATCH v2 0/9] Block/XFS: Support alternative mirror device retry · Dave Chinner <david@fromorbit.com> · 2019-02-18
Re: [RFC PATCH v2 0/9] Block/XFS: Support alternative mirror device retry · Darrick J. Wong <hidden> · 2019-02-19
Re: [RFC PATCH v2 0/9] Block/XFS: Support alternative mirror device retry · Dave Chinner <david@fromorbit.com> · 2019-02-19
Re: [RFC PATCH v2 0/9] Block/XFS: Support alternative mirror device retry · Bob Liu <hidden> · 2019-02-28
Re: [RFC PATCH v2 0/9] Block/XFS: Support alternative mirror device retry · Dave Chinner <david@fromorbit.com> · 2019-02-28
Re: [RFC PATCH v2 0/9] Block/XFS: Support alternative mirror device retry · Bob Liu <hidden> · 2019-03-03
Re: [RFC PATCH v2 0/9] Block/XFS: Support alternative mirror device retry · Dave Chinner <david@fromorbit.com> · 2019-03-03
Re: [RFC PATCH v2 0/9] Block/XFS: Support alternative mirror device retry · Andreas Dilger <hidden> · 2019-02-28
Re: [RFC PATCH v2 0/9] Block/XFS: Support alternative mirror device retry · Bob Liu <hidden> · 2019-03-01
Re: [RFC PATCH v2 0/9] Block/XFS: Support alternative mirror device retry · Dave Chinner <david@fromorbit.com> · 2019-03-03

From: Darrick J. Wong <hidden>
Date: 2019-02-19 03:00:47
Also in: linux-fsdevel, linux-xfs

On Tue, Feb 19, 2019 at 08:31:50AM +1100, Dave Chinner wrote:

On Wed, Feb 13, 2019 at 05:50:35PM +0800, Bob Liu wrote:

quoted

Motivation:
When fs data/metadata checksum mismatch, lower block devices may have other
correct copies. e.g. If XFS successfully reads a metadata buffer off a raid1 but
decides that the metadata is garbage, today it will shut down the entire
filesystem without trying any of the other mirrors.  This is a severe
loss of service, and we propose these patches to have XFS try harder to
avoid failure.

This patch prototype this mirror retry idea by:
* Adding @nr_mirrors to struct request_queue which is similar as
  blk_queue_nonrot(), filesystem can grab device request queue and check max
  mirrors this block device has.
  Helper functions were also added to get/set the nr_mirrors.

* Introducing bi_rd_hint just like bi_write_hint, but bi_rd_hint is a long bitmap
in order to support stacked layer case.

* Modify md/raid1 to support this retry feature.

* Adapter xfs to use this feature.
  If the read verify fails, we loop over the available mirrors and retry the read.

Why does the filesystem have to iterate every single posible
combination of devices that are underneath it?

Wouldn't it be much simpler to be able to attach a verifier
function to the bio, and have each layer that gets called iterate
over all it's copies internally until the verfier function passes
or all copies are exhausted?

This works for stacked mirrors - it can pass the higher layer
verifier down as far as necessary. It can work for RAID5/6, too, by
having that layer supply it's own verifier for reads that verifies
parity and can reconstruct of failure, then when it's reconstructed
a valid stripe it can run the verifier that was supplied to it from
above, etc.

i.e. I dont see why only filesystems should drive retries or have to
be aware of the underlying storage stacking. ISTM that each
layer of the storage stack should be able to verify what has been
returned to it is valid independently of the higher layer
requirements. The only difference from a caller point of view should
be submit_bio(bio); vs submit_bio_verify(bio, verifier_cb_func);

What if instead of constructing a giant pile of verifier call chain, we
simply had a return value from ->bi_end_io that would then be returned
from bio_endio()?  Stacked things like dm-linear would have to know how
to connect the upper endio to the lower endio though.  And that could
have its downsides, too.  How long do we tie up resources in the scsi
layer while upper levels are busy running verification functions...?

Hmmmmmmmmm....

--D

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help