Thread (27 messages) 27 messages, 4 authors, 2021-03-11

Re: [PATCH v3 09/11] dm: support IO polling for bio-based dm device

From: JeffleXu <jefflexu@linux.alibaba.com>
Date: 2021-02-09 06:24:17
Also in: dm-devel, linux-block


On 2/9/21 2:13 PM, JeffleXu wrote:

On 2/9/21 11:11 AM, Ming Lei wrote:
quoted
On Mon, Feb 08, 2021 at 04:52:41PM +0800, Jeffle Xu wrote:
quoted
DM will iterate and poll all polling hardware queues of all target mq
devices when polling IO for dm device. To mitigate the race introduced
by iterating all target hw queues, a per-hw-queue flag is maintained
What is the per-hw-queue flag?
Sorry I forgot to update the commit message as the implementation
changed. Actually this mechanism is implemented by patch 10 of this
patch set.
quoted
quoted
to indicate whether this polling hw queue currently being polled on or
not. Every polling hw queue is exclusive to one polling instance, i.e.,
the polling instance will skip this polling hw queue if this hw queue
currently is being polled by another polling instance, and start
polling on the next hw queue.
Not see such skip in dm_poll_one_dev() in which
queue_for_each_poll_hw_ctx() is called directly for polling all POLL
hctxs of the request queue, so can you explain it a bit more about this
skip mechanism?
It is implemented as patch 10 of this patch set. When spin_trylock()
fails, the polling instance will return immediately, instead of busy
waiting.

quoted
Even though such skipping is implemented, not sure if good performance
can be reached because hctx poll may be done in ping-pong style
among several CPUs. But blk-mq hctx is supposed to have its cpu affinities.
Yes, the mechanism of iterating all hw queues can make the competition
worse.

If every underlying data device has **only** one polling hw queue, then
this ping-pong style polling still exist, even when we implement split
bio tracking mechanism, i.e., acquiring the specific hw queue the bio
enqueued into. Because multiple polling instance has to compete for the
only polling hw queue.

But if multiple polling hw queues per device are reserved for multiple
polling instances, (e.g., every underlying data device has 3 polling hw
queues when there are 3 polling instances), just as what we practice on
mq polling, then the current implementation of iterating all hw queues
will indeed works in a ping-pong style, while this issue shall not exist
when accurate split bio tracking mechanism could be implemented.
If not considering process migration, I could somehow avoid iterating
all hw queues, still in the framework of the current implementation. For
example. For example, the CPU number of the IO submitting process could
be stored in the cookie, while the polling routine will only iterate hw
queues to which the stored CPU number maps. Just a temporary insight
though ....


-- 
Thanks,
Jeffle
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help