Re: [PATCH] block, bfq: keep peak_rate estimation within range 1..2^32-1

From: Paolo Valente <hidden>
Date: 2018-03-20 03:00:20
Also in: lkml

Il giorno 19 mar 2018, alle ore 14:28, Konstantin Khlebnikov =

[off-list ref] ha scritto:

=20
On 19.03.2018 09:03, Paolo Valente wrote:

quoted

Il giorno 05 mar 2018, alle ore 04:48, Konstantin Khlebnikov =

[off-list ref] ha scritto:

quoted

=20
Rate should never overflow or become zero because it is used as =

divider.

quoted

This patch accumulates it with saturation.
=20
Signed-off-by: Konstantin Khlebnikov <redacted>
---
block/bfq-iosched.c |    8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
=20

diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index aeca22d91101..a236c8d541b5 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c

@@ -2546,7 +2546,8 @@ static void bfq_reset_rate_computation(struct =

bfq_data *bfqd,

quoted

=20
static void bfq_update_rate_reset(struct bfq_data *bfqd, struct =

request *rq)

quoted

{
-	u32 rate, weight, divisor;
+	u32 weight, divisor;
+	u64 rate;
=20
	/*
	 * For the convergence property to hold (see comments on

@@ -2634,9 +2635,10 @@ static void bfq_update_rate_reset(struct =

bfq_data *bfqd, struct request *rq)

quoted

	 */
	bfqd->peak_rate *=3D divisor-1;
	bfqd->peak_rate /=3D divisor;
-	rate /=3D divisor; /* smoothing constant alpha =3D 1/divisor */
+	do_div(rate, divisor);	/* smoothing constant alpha =3D =

1/divisor */

quoted

=20
-	bfqd->peak_rate +=3D rate;
+	/* rate should never overlow or become zero */

It is bfqd->peak_rate that is used as a divider, and bfqd->peak_rate =

doesn't risk to be zero even if the variable 'rate' is zero here.

quoted

So I guess the reason why you consider the possibility that =

bfqd->peak_rate becomes zero is because of an overflow when summing =
'rate'. But, according to my calculations, this should be impossible =
with devices with sensible speeds.

quoted

These are the reasons why I decided I could make it with a 32-bit =

variable, without any additional clamping. Did I make any mistake in my =
evaluation?

=20
According to Murphy's law this is inevitable..
=20

Yep.  Actually Murphy has been even clement this time, by making the
failure occur to a kernel expert :)

I've seen couple division by zero crashes in bfq_wr_duration.
Unfortunately logs weren't recorded.
=20

quoted

Anyway, even if I made some mistake about the maximum possible value =

of the device rate, and the latter may be too high for bfqd->peak_rate =
to contain it, then I guess the right solution would not be to clamp the =
actual rate to U32_MAX, but to move bfqd->peak_rate to 64 bits. Or am I =
missing something else?

quoted

+	bfqd->peak_rate =3D clamp_t(u64, rate + bfqd->peak_rate, 1, =

U32_MAX);

=20
32-bit should be enough and better for division.
My patch makes sure it never overflows/underflows.
That's cheaper than full 64-bit/64-bit division.
Anyway 64-bit speed could overflow too. =3D)
=20

I see your point.  Still, if the mistake is not in sizing, then you
bumped into some odd bug.  In this respect, I don't like much the idea
of sweeping the dust under the carpet.  So, let me ask you for a
little bit more help.  With your patch applied, and thus with no risk
of crashes, what about adding, right before your clamp_t, something
like:

if (!bfqd->peak_rate)
	pr_crit(<dump of all the variables involved in updating =
bfqd->peak_rate>);

Once the failure shows up (Murphy permitting), we might have hints to
the bug causing it.

Apart from that, I have no problem with patches that make bfq more
robust, even in a sort of black-box way.

Thanks a lot,
Paolo

=20

quoted

	update_thr_responsiveness_params(bfqd);
=20
reset_computation:

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help