Re: [PATCH V3] blk-mq: introduce BLK_STS_DEV_RESOURCE
From: Ming Lei <hidden>
Date: 2018-01-27 23:41:57
Also in:
dm-devel, linux-scsi
On Sat, Jan 27, 2018 at 10:12:43PM +0000, Bart Van Assche wrote:
On Sat, 2018-01-27 at 14:09 -0500, Mike Snitzer wrote:quoted
Ming let me know that he successfully tested this V3 patch using both your test (fio to both mpath and underlying path) and Bart's (02-mq with can_queue in guest). Would be great if you'd review and verify this fix works for you too. Ideally we'd get a fix for this regression staged for 4.16 inclusion. This V3 patch seems like the best option we have at this point.Hello Mike, There are several issues with the patch at the start of this thread: - It is an unnecessary change of the block layer API. Queue stalls can already be addressed with the current block layer API, namely by inserting a blk_mq_delay_run_hw_queue() call before returning BLK_STS_RESOURCE.
Again, both Jens and I concluded that it is a generic issue, which need generic solution. https://marc.info/?l=linux-kernel&m=151638176727612&w=2 Otherwise, it needs to change the handling on every BLK_STS_RESOURCE in drivers, do we really want to do that? Not mention, the request isn't added to dispatch list yet in .queue_rq(), strictly speaking, it is not correct to call blk_mq_delay_run_hw_queue() in .queue_rq(), so the current block layer API can't handle it well enough.
- The patch at the start of this thread complicates code further that is already too complicated, namely the blk-mq core.
That is just your opinion, I don't agree.
- The patch at the start of this thread introduces a regression in the SCSI core, namely a queue stall if a request completion occurs concurrently with the newly added BLK_MQ_S_SCHED_RESTART test in the blk-mq core.
This patch only moves the blk_mq_delay_run_hw_queue() from scsi_queue_rq() to blk-mq, again, please explain it in detail how this patch V3 introduces this regression on SCSI. Actually this patch should fix a race on SCSI-MQ, because when scsi_queue_rq() call blk_mq_delay_run_hw_queue(), the request isn't in dispatch list yet, so in theory this request may not be visible when __blk_mq_run_hw_queue() is run. Don't expect the 3ms delay will cover that, it is absolutely fragile to depend on timing to deal with the race. Maybe it can be one LSF/MM topic proposal... thanks, Ming