Re: high overhead of functions blkg_*stats_* in bfq
From: Tejun Heo <tj@kernel.org>
Date: 2017-10-18 13:19:21
Hello, Paolo. On Tue, Oct 17, 2017 at 12:11:01PM +0200, Paolo Valente wrote: ...
protected by a per-device scheduler lock. To give you an idea, on an Intel i7-4850HQ, and with 8 threads doing random I/O in parallel on null_blk (configured with 0 latency), if the update of groups stats is removed, then the throughput grows from 260 to 404 KIOPS. This and all the other results we might share in this thread can be reproduced very easily with a (useful) script made by Luca Miccio [1].
I don't think the old request_queue is ever built for multiple CPUs hitting on a mem-backed device.
We tried to understand the reason for this high overhead, and, in particular, to find out whether whether there was some issue that we could address on our own. But the causes seem somehow substantial: one of the most time-consuming operations needed by some blkg_*stats_* functions is, e.g., find_next_bit, for which we don't see any trivial replacement.
Can you point to the specific ones? I can't find find_next_bit usages in generic blkg code.
So, as a first attempt to reduce this severe slowdown, we have made a patch that moves the invocation of blkg_*stats_* functions outside the critical sections protected by the bfq lock. Still, these functions apparently need to be protected with the request_queue lock, because
blkgs are already protected with RCU, so RCU protection should be enough. Thanks. -- tejun