Re: [bug report] shared tags causes IO hang and performance drop
From: John Garry <hidden>
Date: 2021-04-26 17:05:52
Also in:
linux-scsi
On 26/04/2021 17:03, Ming Lei wrote:
quoted
For both hostwide and non-hostwide tags, we have standalone sched tags and request pool per hctx when q->nr_hw_queues > 1.driver tags is shared for hostwide tags.quoted
quoted
That is why you observe that scheduler tag exhaustion is easy to trigger in case of non-hostwide tags. I'd suggest to add one per-request-queue sched tags, and make all hctxs sharing it, just like what you did for driver tag.That sounds reasonable. But I don't see how this is related to hostwide tags specifically, but rather just having q->nr_hw_queues > 1, which NVMe PCI and some other SCSI MQ HBAs have (without using hostwide tags).Before hostwide tags, the whole scheduler queue depth should be 256. After hostwide tags, the whole scheduler queue depth becomes 256 * nr_hw_queues. But the driver tag queue depth is_not_ changed.
Fine.
More requests come and are tried to dispatch to LLD and can't succeed because of limited driver tag depth, and CPU utilization could be increased.
Right, maybe this is a problem. I quickly added some debug, and see that __blk_mq_get_driver_tag()->__sbitmap_queue_get() fails ~7% for hostwide tags and 3% for non-hostwide tags. Having it fail at all for non-hostwide tags seems a bit dubious... here's the code for deciding the rq sched tag depth: q->nr_requests = 2 * min(q->tags_set->queue_depth [128], BLK_DEV_MAX_RQ [128]) So we get 256 for our test scenario, which is appreciably bigger than q->tags_set->queue_depth, so the failures make sense. Anyway, I'll look at adding code for a per-request queue sched tags to see if it helps. But I would plan to continue to use a per hctx sched request pool. Thanks, John