Thread (53 messages) 53 messages, 5 authors, 2017-03-31

Re: [PATCH RFC 00/14] Add the BFQ I/O Scheduler to blk-mq

From: Paolo Valente <hidden>
Date: 2017-03-18 10:53:52
Also in: lkml

Il giorno 14 mar 2017, alle ore 16:32, Bart Van Assche =
[off-list ref] ha scritto:
=20
On Tue, 2017-03-14 at 16:35 +0100, Paolo Valente wrote:
quoted
quoted
Il giorno 07 mar 2017, alle ore 02:00, Bart Van Assche =
[off-list ref] ha scritto:
quoted
quoted
=20
Additionally, the complexity of the code is huge. Just like for CFQ,
sooner or later someone will run into a bug or a performance issue
and will post a patch to fix it. However, the complexity of BFQ is
such that a source code review alone won't be sufficient to verify
whether or not such a patch negatively affects a workload or device
that has not been tested by the author of the patch. This makes me
wonder what process should be followed to verify future BFQ patches?
=20
Third and last, a proposal: why don't we discuss this issue at LSF
too?  In particular, we could talk about the parts of BFQ that seem
more complex to understand, until they become clearer to you.  Then I
could try to understand what helped make them clearer, and translate
it into extra comments in the code or into other, more radical
changes.
=20
Hello Paolo,
=20
Sorry if my comment was not clear enough. Suppose that e.g. someone =
would
like to modify the following code:
=20
static int bfq_min_budget(struct bfq_data *bfqd)
{
      if (bfqd->budgets_assigned < bfq_stats_min_budgets)
              return bfq_default_max_budget / 32;
      else
              return bfqd->bfq_max_budget / 32;
}
=20
How to predict the performance impact of any changes in e.g. this =
function?
It is really great that a performance benchmark is available. But what =
should
a developer do who only has access to a small subset of all the =
storage
devices that are supported by the Linux kernel and hence who can not =
run the
benchmark against every supported storage device? Do developers who do =
not
fully understand the BFQ algorithms and who run into a performance =
problem
have any other option than trial and error for fixing such performance =
issues?
=20
Hi Bart,
maybe I got your point even before, but I did not reply consistently.
You are highlighting an important problem, which, I think, can be
stated in more general terms: if one makes a change in any complex
component, which, in its turn, interacts with complex I/O devices,
then it is hard, if ever possible, to prove, that that change will
cause no regression with any possible device, just by speculation.
Actually, facts show that this often holds even for simple components,
given the complexity of the environment in which they work.  Of
course, if not only the component is complex, but who modifies it does
not even fully understand how that component works, then regressions
on untested devices are certainly more probable.

These general considerations are the motivation for my previous
proposals: reduce complexity by breaking into simpler, independent
pieces; fix or improve documentation where needed or useful (why don't
we discuss the most obscure parts at lsfmm?); use a fixed set of
benchmarks to find regressions.  Any other proposal is more than
welcome.

Thanks,
Paolo

Thanks,
=20
Bart.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help