Thread (43 messages) 43 messages, 5 authors, 2017-11-04

Re: [PATCH V2 0/2] block: remove unnecessary RESTART

From: Ming Lei <hidden>
Date: 2017-11-01 16:59:21

On Wed, Nov 01, 2017 at 04:47:06PM +0000, Bart Van Assche wrote:
On Wed, 2017-11-01 at 12:08 +0800, Ming Lei wrote:
quoted
On Wed, Nov 01, 2017 at 03:54:09AM +0000, Bart Van Assche wrote:
quoted
On Tue, 2017-10-31 at 09:47 +0800, Ming Lei wrote:
quoted
On Mon, Oct 30, 2017 at 08:24:57PM +0000, Bart Van Assche wrote:
quoted
On Fri, 2017-10-27 at 13:38 +0800, Ming Lei wrote:
quoted
On Fri, Oct 27, 2017 at 04:53:18AM +0000, Bart Van Assche wrote:
quoted
On Fri, 2017-10-27 at 12:43 +0800, Ming Lei wrote:
quoted
The 1st patch removes the RESTART for TAG-SHARED because SCSI handles it
by itself, and not necessary to waste CPU to do the expensive RESTART.
And Roman Pen reported that this RESTART cuts half of IOPS in his case.

The 2nd patch removes the RESTART when .get_budget returns BLK_STS_RESOURCE,
and this RESTART is handled by SCSI's RESTART(scsi_end_request()) too.
There are more block drivers than the SCSI core that share tags. If the
Could you share us what the other in-tree driver which share tags is?
I think the following in-tree drivers support shared tags (in alphabetical
order):
* null_blk. See also the shared_tags kernel module parameter.
* nvme. See also nvme_alloc_ns().
* scsi-mq.
For both null_blk and nvme, we don't need to deal with cross-queue RESTART,
because BLK_MQ_S_TAG_WAITING has handled it already.
blk_mq_dispatch_wait_add() returns immediately if BLK_MQ_S_TAG_WAITING is
already set. Are you really sure that removing the restart mechanism doesn't
break the NVMe driver if there are multiple namespaces?> 
If this flag is set, that means blk_mq_dispatch_wake() will be run when
driver tag is available, so the hw queue will be run at that time.
The only hw queue that will be rerun if BLK_MQ_S_TAG_WAITING is set is the hw
queue for which that flag was set. Sorry but I don't think that mechanism by
itself is sufficient to avoid queue lockups with the NVMe driver in case of
shared namespaces. See also the description of commit da55f2cc7841 ("blk-mq:
Why?
use sbq wait queues instead of restart for driver tags").
This commit just make sure that the hctx of BLK_MQ_S_TAG_WAITING will be
run again once there is driver tag available, see its commit log simply:

    However, we can still use the struct sbitmap_queue wait queues with a
    custom callback instead of blocking. This has a few benefits:
    
    1. It avoids iterating over all hardware queues when completing an I/O,
       which the current restart code has to do.
    2. It benefits from the existing rolling wakeup code.
    3. It avoids punting to another thread just to have it block.
    
The commit has described clearly that we don't need to iterate over all hw
queue when completing an I/O.

-- 
Ming
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help