Re: [PATCH] ufs: Increase the usable queue depth

From: Can Guo <hidden>
Date: 2021-06-29 13:41:17

Hi Bart,

On 2021-05-14 00:49, Bart Van Assche wrote:

quoted hunk ↗ jump to hunk

With the current implementation of the UFS driver active_queues is 1
instead of 0 if all UFS request queues are idle. That causes
hctx_may_queue() to divide the queue depth by 2 when queueing a request
and hence reduces the usable queue depth.

The shared tag set code in the block layer keeps track of the number of
active request queues. blk_mq_tag_busy() is called before a request is
queued onto a hwq and blk_mq_tag_idle() is called some time after the 
hwq
became idle. blk_mq_tag_idle() is called from inside 
blk_mq_timeout_work().
Hence, blk_mq_tag_idle() is only called if a timer is associated with 
each
request that is submitted to a request queue that shares a tag set with
another request queue. Hence this patch that adds a 
blk_mq_start_request()
call in ufshcd_exec_dev_cmd(). This patch doubles the queue depth on my
test setup from 16 to 32.

In addition to increasing the usable queue depth, also fix the
documentation of the 'timeout' parameter in the header above
ufshcd_exec_dev_cmd().

Cc: Can Guo <redacted>
Cc: Alim Akhtar <alim.akhtar@samsung.com>
Cc: Avri Altman <avri.altman@wdc.com>
Cc: Stanley Chu <redacted>
Cc: Bean Huo <redacted>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Fixes: 7252a3603015 ("scsi: ufs: Avoid busy-waiting by eliminating tag
conflicts")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
---
 drivers/scsi/ufs/ufshcd.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index c96e36aab989..e669243354da 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c

@@ -2838,7 +2838,7 @@ static int ufshcd_wait_for_dev_cmd(struct ufs_hba

*hba,
  * ufshcd_exec_dev_cmd - API for sending device management requests
  * @hba: UFS hba
  * @cmd_type: specifies the type (NOP, Query...)
- * @timeout: time in seconds
+ * @timeout: timeout in milliseconds
  *
  * NOTE: Since there is only one available tag for device management 
commands,
  * it is expected you hold the hba->dev_cmd.lock mutex.

@@ -2868,6 +2868,9 @@ static int ufshcd_exec_dev_cmd(struct ufs_hba

*hba,
 	}
 	tag = req->tag;
 	WARN_ON_ONCE(!ufshcd_valid_tag(hba, tag));
+	/* Set the timeout such that the SCSI error handler is not activated. 
*/
+	req->timeout = msecs_to_jiffies(2 * timeout);
+	blk_mq_start_request(req);

 	init_completion(&wait);
 	lrbp = &hba->lrb[tag];

We found a regression after this change gets merged -

schedule
blk_mq_get_tag
__blk_mq_alloc_request
blk_get_request
ufshcd_exec_dev_cmd
ufshcd_query_flag
ufshcd_wb_ctrl
ufshcd_devfreq_scale
ufshcd_devfreq_target
devfreq_set_target
update_devfreq
devfreq_monitor
process_one_work
worker_thread
kthread
ret_from_fork

Since ufshcd_devfreq_scale() blocks scsi requests,
when ufshcd_wb_ctrl() runs, if it cannot get a free
tag (all tags are taken by normal requests), then
ufshcd_devfreq_scale() gets stuck, thus scsi layer
stays blocked, which leads to I/O hung. Maybe consider
unblocking scsi requests before call ufshcd_wb_ctrl()?

Thanks,

Can Guo.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help