Re: [RFC PATCH] raid1: reset 'bi_next' before reuse the bio

From: NeilBrown <hidden>
Date: 2017-04-04 22:18:35
Also in: lkml
Subsystem: software raid (multiple disks) support, the rest · Maintainers: Song Liu, Yu Kuai, Linus Torvalds

On Tue, Apr 04 2017, Michael Wang wrote:

quoted hunk ↗ jump to hunk

During the testing we found the sync read bio can go through
path:

  md_do_sync()
    sync_request()
      generic_make_request()
        blk_queue_bio()
          blk_attempt_plug_merge()
            bio->bi_next CHAINED HERE

  ...

  raid1d()
    sync_request_write()
      fix_sync_read_error()
        if FailFast && Faulty
          bio->bi_end_io = end_sync_write
      generic_make_request()
        BUG_ON(bio->bi_next)

This need to meet the conditions:
  * bio once merged
  * read disk have FailFast enabled
  * read disk is Faulty

And since the block layer won't reset the 'bi_next' after bio
is done inside request, we hit the BUG like that.

This patch simply reset the bi_next before we reuse it.

Signed-off-by: Michael Wang <redacted>
---
 drivers/md/raid1.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 7d67235..0554110 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c

@@ -1986,11 +1986,13 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
 		/* Don't try recovering from here - just fail it
 		 * ... unless it is the last working device of course */
 		md_error(mddev, rdev);
-		if (test_bit(Faulty, &rdev->flags))
+		if (test_bit(Faulty, &rdev->flags)) {
 			/* Don't try to read from here, but make sure
 			 * put_buf does it's thing
 			 */
 			bio->bi_end_io = end_sync_write;
+			bio->bi_next = NULL;
+		}
 	}
 
 	while(sectors) {


Ah - I see what is happening now.  I was looking at the vanilla 4.4
code, which doesn't have the failfast changes.

I don't think your patch is correct though.  We really shouldn't be
re-using that bio, and setting bi_next to NULL just hides the bug.  It
doesn't fix it.
As the rdev is now Faulty, it doesn't make sense for
sync_request_write() to submit a write request to it.

Can you confirm that this works please.

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index d2d8b8a5bd56..219f1e1f1d1d 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c

@@ -2180,6 +2180,8 @@ static void sync_request_write(struct mddev *mddev, struct r1bio *r1_bio)
 		     (i == r1_bio->read_disk ||
 		      !test_bit(MD_RECOVERY_SYNC, &mddev->recovery))))
 			continue;
+		if (test_bit(Faulty, &conf->mirrors[i].rdev->flags))
+			continue;
 
 		bio_set_op_attrs(wbio, REQ_OP_WRITE, 0);
 		if (test_bit(FailFast, &conf->mirrors[i].rdev->flags))

Thanks,
NeilBrown

Attachments

signature.asc [application/pgp-signature] 832 bytes

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help