Thread (25 messages) 25 messages, 4 authors, 2021-03-18

Re: [PATCH v7 2/3] block: add bdev_interposer

From: Sergei Shtepa <hidden>
Date: 2021-03-17 12:23:23
Also in: dm-devel, linux-block, lkml

The 03/17/2021 06:03, Ming Lei wrote:
On Tue, Mar 16, 2021 at 07:35:44PM +0300, Sergei Shtepa wrote:
quoted
The 03/16/2021 11:09, Ming Lei wrote:
quoted
On Fri, Mar 12, 2021 at 06:44:54PM +0300, Sergei Shtepa wrote:
quoted
bdev_interposer allows to redirect bio requests to another devices.

Signed-off-by: Sergei Shtepa <redacted>
---
 block/bio.c               |  2 ++
 block/blk-core.c          | 57 +++++++++++++++++++++++++++++++++++++++
 block/genhd.c             | 54 +++++++++++++++++++++++++++++++++++++
 include/linux/blk_types.h |  3 +++
 include/linux/blkdev.h    |  9 +++++++
 5 files changed, 125 insertions(+)
diff --git a/block/bio.c b/block/bio.c
index a1c4d2900c7a..0bfbf06475ee 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -640,6 +640,8 @@ void __bio_clone_fast(struct bio *bio, struct bio *bio_src)
 		bio_set_flag(bio, BIO_THROTTLED);
 	if (bio_flagged(bio_src, BIO_REMAPPED))
 		bio_set_flag(bio, BIO_REMAPPED);
+	if (bio_flagged(bio_src, BIO_INTERPOSED))
+		bio_set_flag(bio, BIO_INTERPOSED);
 	bio->bi_opf = bio_src->bi_opf;
 	bio->bi_ioprio = bio_src->bi_ioprio;
 	bio->bi_write_hint = bio_src->bi_write_hint;
diff --git a/block/blk-core.c b/block/blk-core.c
index fc60ff208497..da1abc4c27a9 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1018,6 +1018,55 @@ static blk_qc_t __submit_bio_noacct_mq(struct bio *bio)
 	return ret;
 }
 
+static noinline blk_qc_t submit_bio_interposed(struct bio *bio)
+{
+	blk_qc_t ret = BLK_QC_T_NONE;
+	struct bio_list bio_list[2] = { };
+	struct gendisk *orig_disk;
+
+	if (current->bio_list) {
+		bio_list_add(&current->bio_list[0], bio);
+		return BLK_QC_T_NONE;
+	}
+
+	orig_disk = bio->bi_bdev->bd_disk;
+	if (unlikely(bio_queue_enter(bio)))
+		return BLK_QC_T_NONE;
+
+	current->bio_list = bio_list;
+
+	do {
+		struct block_device *interposer = bio->bi_bdev->bd_interposer;
+
+		if (unlikely(!interposer)) {
+			/* interposer was removed */
+			bio_list_add(&current->bio_list[0], bio);
+			break;
+		}
+		/* assign bio to interposer device */
+		bio_set_dev(bio, interposer);
+		bio_set_flag(bio, BIO_INTERPOSED);
+
+		if (!submit_bio_checks(bio))
+			break;
+		/*
+		 * Because the current->bio_list is initialized,
+		 * the submit_bio callback will always return BLK_QC_T_NONE.
+		 */
+		interposer->bd_disk->fops->submit_bio(bio);
Given original request queue may become live when calling attach() and
detach(), see below comment. bdev_interposer_detach() may be run
when running ->submit_bio(), meantime the interposer device is
gone during the period, then kernel oops.
I think that since the bio_queue_enter() function was called,
q->q_usage_counter will not allow the critical code in the attach/detach
functions to be executed, which is located between the blk_freeze_queue
and blk_unfreeze_queue calls.
Please correct me if I'm wrong.
quoted
quoted
+	} while (false);
+
+	current->bio_list = NULL;
+
+	blk_queue_exit(orig_disk->queue);
+
+	/* Resubmit remaining bios */
+	while ((bio = bio_list_pop(&bio_list[0])))
+		ret = submit_bio_noacct(bio);
+
+	return ret;
+}
+
 /**
  * submit_bio_noacct - re-submit a bio to the block device layer for I/O
  * @bio:  The bio describing the location in memory and on the device.
@@ -1029,6 +1078,14 @@ static blk_qc_t __submit_bio_noacct_mq(struct bio *bio)
  */
 blk_qc_t submit_bio_noacct(struct bio *bio)
 {
+	/*
+	 * Checking the BIO_INTERPOSED flag is necessary so that the bio
+	 * created by the bdev_interposer do not get to it for processing.
+	 */
+	if (bdev_has_interposer(bio->bi_bdev) &&
+	    !bio_flagged(bio, BIO_INTERPOSED))
+		return submit_bio_interposed(bio);
+
 	if (!submit_bio_checks(bio))
 		return BLK_QC_T_NONE;
 
diff --git a/block/genhd.c b/block/genhd.c
index c55e8f0fced1..c840ecffea68 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -30,6 +30,11 @@
 static struct kobject *block_depr;
 
 DECLARE_RWSEM(bdev_lookup_sem);
+/*
+ * Prevents different block-layer interposers from attaching or detaching
+ * to the block device at the same time.
+ */
+static DEFINE_MUTEX(bdev_interposer_attach_lock);
 
 /* for extended dynamic devt allocation, currently only one major is used */
 #define NR_EXT_DEVT		(1 << MINORBITS)
@@ -1940,3 +1945,52 @@ static void disk_release_events(struct gendisk *disk)
 	WARN_ON_ONCE(disk->ev && disk->ev->block != 1);
 	kfree(disk->ev);
 }
+
+int bdev_interposer_attach(struct block_device *original,
+			   struct block_device *interposer)
+{
+	int ret = 0;
+
+	if (WARN_ON(((!original) || (!interposer))))
+		return -EINVAL;
+	/*
+	 * interposer should be simple, no a multi-queue device
+	 */
+	if (!interposer->bd_disk->fops->submit_bio)
+		return -EINVAL;
+
+	if (WARN_ON(!blk_mq_is_queue_frozen(original->bd_disk->queue)))
+		return -EPERM;
The original request queue may become live now...
Yes.
I will remove the blk_mq_is_queue_frozen() function and use a different
approach.
Looks what attach and detach needs is that queue is kept as frozen state
instead of being froze simply at the beginning of the two functions, so
you can simply call freeze/unfreeze inside the two functions.

But what if 'original' isn't a MQ queue?  queue usage counter is just
grabed when calling ->submit_bio(), and queue freeze doesn't guarantee there
isn't any io activity, is that a problem for bdev_interposer use case?

-- 
Ming
It makes sense to add freeze_bdev/thaw_bdev. This will be useful.
For the main file systems, the freeze functions are defined 
sb->s_op->freeze_super() or sb - >s_op->freeze_fs()
(btrfs, ext2, ext4, f2fs, jfs, nilfs2, reiserfs, xfs).
If the file system is frozen, then no new requests should be received.

But if the file system does not support freeze or the disk is used without
a file system, as for some databases, freeze_bdev seems useless to me.
In this case, we will need to stop working with the disk from user-space,
for example, to freeze the database itself.

I can add dm_suspend() before bdev_interposer_detach(). This will ensure that
all intercepted requests have been processed. Applying dm_suspend() before
bdev_interposer_attach() is pointless. The attachment is made when the target
is created, and at this time the target is not ready to work yet.
There shouldn't be any bio requests, I suppose. In addition,
sb->s_op->freeze_fs() for the interposer will not be called, because the file
system is not mounted for the interposer device. It should not be able to
be mounted. To do this, I will add an exclusive opening of the interposer
device.

I'll add freeze_bdev() for the original device and dm_suspend() for the
interposer to the DM code. For normal operation of bdev_interposer,
it is enough to transfer blk_mq_freeze_queue and blk_mq_quiesce_queue to
bdev_interposer_attach/bdev_interposer_detach.
The lock on the counter q->q_usage_counter is enough to not catch NULL in
bd_interposer.

Do you think this is enough?
I think there are no other ways to stop the block device queue.

-- 
Sergei Shtepa
Veeam Software developer.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help