Thread (29 messages) 29 messages, 5 authors, 2017-11-06

Re: high overhead of functions blkg_*stats_* in bfq

From: Tejun Heo <tj@kernel.org>
Date: 2017-10-18 13:19:21

Hello, Paolo.

On Tue, Oct 17, 2017 at 12:11:01PM +0200, Paolo Valente wrote:
...
protected by a per-device scheduler lock.  To give you an idea, on an
Intel i7-4850HQ, and with 8 threads doing random I/O in parallel on
null_blk (configured with 0 latency), if the update of groups stats is
removed, then the throughput grows from 260 to 404 KIOPS.  This and
all the other results we might share in this thread can be reproduced
very easily with a (useful) script made by Luca Miccio [1].
I don't think the old request_queue is ever built for multiple CPUs
hitting on a mem-backed device.
We tried to understand the reason for this high overhead, and, in
particular, to find out whether whether there was some issue that we
could address on our own.  But the causes seem somehow substantial:
one of the most time-consuming operations needed by some blkg_*stats_*
functions is, e.g., find_next_bit, for which we don't see any trivial
replacement.
Can you point to the specific ones?  I can't find find_next_bit usages
in generic blkg code.
So, as a first attempt to reduce this severe slowdown, we have made a
patch that moves the invocation of blkg_*stats_* functions outside the
critical sections protected by the bfq lock.  Still, these functions
apparently need to be protected with the request_queue lock, because
blkgs are already protected with RCU, so RCU protection should be
enough.

Thanks.

-- 
tejun
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help