Thread (28 messages) 28 messages, 7 authors, 2019-03-03

Re: [RFC PATCH v2 0/9] Block/XFS: Support alternative mirror device retry

From: jianchao.wang <hidden>
Date: 2019-02-19 01:27:06
Also in: linux-fsdevel, linux-xfs


On 2/18/19 4:08 PM, jianchao.wang wrote:
Hi Bob

On 2/13/19 5:50 PM, Bob Liu wrote:
quoted
Motivation:
When fs data/metadata checksum mismatch, lower block devices may have other
correct copies. e.g. If XFS successfully reads a metadata buffer off a raid1 but
decides that the metadata is garbage, today it will shut down the entire
filesystem without trying any of the other mirrors.  This is a severe
loss of service, and we propose these patches to have XFS try harder to
avoid failure.

This patch prototype this mirror retry idea by:
* Adding @nr_mirrors to struct request_queue which is similar as
  blk_queue_nonrot(), filesystem can grab device request queue and check max
  mirrors this block device has.
  Helper functions were also added to get/set the nr_mirrors.

* Introducing bi_rd_hint just like bi_write_hint, but bi_rd_hint is a long bitmap
in order to support stacked layer case.
Why does we need a bitmap to know which underlying device has been tried ?
For example, the following scenario,

                    md8
                   / | \
               sda sdb sdc

If the the raid read the data from sda and fs check and find the data is corrupted.
Then we may just need to let raid1 know that the data is from sda. Then based on this
hint, raid1 could handle it with handle_read_error to try other replica and fix the
error.
This doesn't work.
The md raid1 can only see IO success or failure, so fix_read_error won't fix this.
Sorry for the noise.

Thanks
Jianchao
If this is feasible, we just need to modify the bio as following and needn't add any
bytes in it.

struct bio {
    ...
    union {
        unsigned short bi_write_hint;
        unsigned short bi_read_hint;
    }
    ...
}

Thanks
Jianchao
quoted
* Modify md/raid1 to support this retry feature.

* Adapter xfs to use this feature.
  If the read verify fails, we loop over the available mirrors and retry the read.

* Rewrite retried read
  When the read verification fails, but the retry succeedes
  write the buffer back to correct the bad mirror

* Add tracepoints and logging to alternate device retry.
  This patch adds new log entries and trace points to the alternate device retry
  error path.

Changes v2:
- No more reuse bi_write_hint
- Stacked layer support(see patch 4/9)
- Other feedback fix

Allison Henderson (5):
  Add b_alt_retry to xfs_buf
  xfs: Add b_rd_hint to xfs_buf
  xfs: Add device retry
  xfs: Rewrite retried read
  xfs: Add tracepoints and logging to alternate device retry

Bob Liu (4):
  block: add nr_mirrors to request_queue
  block: add rd_hint to bio and request
  md:raid1: set mirrors correctly
  md:raid1: rd_hint support and consider stacked layer case

 Documentation/block/biodoc.txt |   3 +
 block/bio.c                    |   1 +
 block/blk-core.c               |   4 ++
 block/blk-merge.c              |   6 ++
 block/blk-settings.c           |  24 +++++++
 block/bounce.c                 |   1 +
 drivers/md/raid1.c             | 123 ++++++++++++++++++++++++++++++++-
 fs/xfs/xfs_buf.c               |  58 +++++++++++++++-
 fs/xfs/xfs_buf.h               |  14 ++++
 fs/xfs/xfs_trace.h             |   6 +-
 include/linux/blk_types.h      |   1 +
 include/linux/blkdev.h         |   4 ++
 include/linux/types.h          |   3 +
 13 files changed, 244 insertions(+), 4 deletions(-)
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help