Thread (24 messages) 24 messages, 2 authors, 2020-04-22

Re: [RESEND RFC PATCH 2/8] block: Allow sending a batch of requests from the scheduler to hardware

From: Baolin Wang <baolin.wang7@gmail.com>
Date: 2020-03-20 10:27:58
Also in: linux-mmc, lkml

Hi Ming,

On Wed, Mar 18, 2020 at 6:26 PM Baolin Wang [off-list ref] wrote:
Hi Ming,

On Wed, Mar 18, 2020 at 6:01 PM Ming Lei [off-list ref] wrote:
quoted
On Mon, Mar 16, 2020 at 06:01:19PM +0800, Baolin Wang wrote:
quoted
As we know, some SD/MMC host controllers can support packed request,
that means we can package several requests to host controller at one
time to improve performence. So the hardware driver expects the blk-mq
can dispatch a batch of requests at one time, and driver can use bd.last
to indicate if it is the last request in the batch to help to combine
requests as much as possible.

Thus we should add batch requests setting from the block driver to tell
the scheduler how many requests can be dispatched in a batch, as well
as changing the scheduler to dispatch more than one request if setting
the maximum batch requests number.
I feel this batch dispatch style is more complicated, and some other
drivers(virtio blk/scsi) still may get benefit if we can pass real 'last' flag in
.queue_rq().

So what about the following way by extending .commit_rqs() to this usage?
And you can do whatever batch processing in .commit_rqs() which will be
guaranteed to be called if BLK_MQ_F_FORCE_COMMIT_RQS is set by driver.
I'm very appreciated for your good suggestion, which is much simpler than mine.
It seems to solve my problem, and I will try it on my platform to see
if it can work and give you the feadback. Thanks again.
I tried your approach on my platform, but met some problems, see below.
quoted
diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index 856356b1619e..cd2bbe56f83f 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -85,11 +85,12 @@ void blk_mq_sched_restart(struct blk_mq_hw_ctx *hctx)
  * its queue by itself in its completion handler, so we don't need to
  * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
  */
-static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
+static bool blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
 {
        struct request_queue *q = hctx->queue;
        struct elevator_queue *e = q->elevator;
        LIST_HEAD(rq_list);
+       bool ret = false;

        do {
                struct request *rq;
@@ -112,7 +113,10 @@ static void blk_mq_do_dispatch_sched(struct blk_mq_hw_ctx *hctx)
                 * in blk_mq_dispatch_rq_list().
                 */
                list_add(&rq->queuelist, &rq_list);
-       } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
+               ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
+       } while (ret);
+
+       return ret;
 }

 static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
@@ -131,11 +135,12 @@ static struct blk_mq_ctx *blk_mq_next_ctx(struct blk_mq_hw_ctx *hctx,
  * its queue by itself in its completion handler, so we don't need to
  * restart queue if .get_budget() returns BLK_STS_NO_RESOURCE.
  */
-static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
+static bool blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)
 {
        struct request_queue *q = hctx->queue;
        LIST_HEAD(rq_list);
        struct blk_mq_ctx *ctx = READ_ONCE(hctx->dispatch_from);
+       bool ret = false;

        do {
                struct request *rq;
@@ -161,10 +166,12 @@ static void blk_mq_do_dispatch_ctx(struct blk_mq_hw_ctx *hctx)

                /* round robin for fair dispatch */
                ctx = blk_mq_next_ctx(hctx, rq->mq_ctx);
-
-       } while (blk_mq_dispatch_rq_list(q, &rq_list, true));
+               ret = blk_mq_dispatch_rq_list(q, &rq_list, true);
+       } while (ret);

        WRITE_ONCE(hctx->dispatch_from, ctx);
+
+       return ret;
 }

 void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
@@ -173,6 +180,7 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
        struct elevator_queue *e = q->elevator;
        const bool has_sched_dispatch = e && e->type->ops.dispatch_request;
        LIST_HEAD(rq_list);
+       bool dispatch_ret;

        /* RCU or SRCU read lock is needed before checking quiesced flag */
        if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q)))
@@ -206,20 +214,26 @@ void blk_mq_sched_dispatch_requests(struct blk_mq_hw_ctx *hctx)
         */
        if (!list_empty(&rq_list)) {
                blk_mq_sched_mark_restart_hctx(hctx);
-               if (blk_mq_dispatch_rq_list(q, &rq_list, false)) {
+               dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
+               if (dispatch_ret) {
                        if (has_sched_dispatch)
-                               blk_mq_do_dispatch_sched(hctx);
+                               dispatch_ret = blk_mq_do_dispatch_sched(hctx);
If we dispatched a request successfully by blk_mq_dispatch_rq_list(),
and got dispatch_ret = true now. Then we will try to dispatch more
reuqests from scheduler by blk_mq_do_dispatch_sched(), but if now no
more requests in scheduler, then we will got dispatch_ret = false. In
this case, we will not issue commit_rqs() to tell the hardware to
handle previous request dispatched from &rq_list.

So I think we should not overlap the 'dispatch_ret'? Or do you have
any other thoughts to fix?
quoted
                        else
-                               blk_mq_do_dispatch_ctx(hctx);
+                               dispatch_ret = blk_mq_do_dispatch_ctx(hctx);
                }
        } else if (has_sched_dispatch) {
-               blk_mq_do_dispatch_sched(hctx);
+               dispatch_ret = blk_mq_do_dispatch_sched(hctx);
        } else if (hctx->dispatch_busy) {
                /* dequeue request one by one from sw queue if queue is busy */
-               blk_mq_do_dispatch_ctx(hctx);
+               dispatch_ret = blk_mq_do_dispatch_ctx(hctx);
        } else {
                blk_mq_flush_busy_ctxs(hctx, &rq_list);
-               blk_mq_dispatch_rq_list(q, &rq_list, false);
+               dispatch_ret = blk_mq_dispatch_rq_list(q, &rq_list, false);
+       }
+
+       if (dispatch_ret) {
+               if (hctx->flags & BLK_MQ_F_FORCE_COMMIT_RQS)
+                       hctx->queue->mq_ops->commit_rqs(hctx);
        }
 }
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 87c6699f35ae..9b46f5d6c7fd 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1238,11 +1238,15 @@ bool blk_mq_dispatch_rq_list(struct request_queue *q, struct list_head *list,
                 * Flag last if we have no more requests, or if we have more
                 * but can't assign a driver tag to it.
                 */
-               if (list_empty(list))
-                       bd.last = true;
-               else {
-                       nxt = list_first_entry(list, struct request, queuelist);
-                       bd.last = !blk_mq_get_driver_tag(nxt);
+               if (!(hctx->flags & BLK_MQ_F_FORCE_COMMIT_RQS)) {
+                       if (list_empty(list))
+                               bd.last = true;
+                       else {
+                               nxt = list_first_entry(list, struct request, queuelist);
+                               bd.last = !blk_mq_get_driver_tag(nxt);
+                       }
+               } else {
+                       bd.last = false;
If we enabled BLK_MQ_F_FORCE_COMMIT_RQS flag, we will always get
bd.last = false even for the real last request in the IO scheduler. I
know you already use commit_irqs() to help to kick driver. But I
worried if it is reasonable that drivers always get bd.last = false.

Thanks for your approach.
quoted
                }

                ret = q->mq_ops->queue_rq(hctx, &bd);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 07fa767bff86..c0ef6990b698 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -394,6 +394,7 @@ enum {
        BLK_MQ_F_SHOULD_MERGE   = 1 << 0,
        BLK_MQ_F_TAG_SHARED     = 1 << 1,
        BLK_MQ_F_NO_MANAGED_IRQ = 1 << 2,
+       BLK_MQ_F_FORCE_COMMIT_RQS = 1 << 3,
        BLK_MQ_F_BLOCKING       = 1 << 5,
        BLK_MQ_F_NO_SCHED       = 1 << 6,
        BLK_MQ_F_ALLOC_POLICY_START_BIT = 8,
Thanks,
Ming

--
Baolin Wang


-- 
Baolin Wang
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help