Re: [PATCH] block, bfq: keep peak_rate estimation within range 1..2^32-1
From: Paolo Valente <hidden>
Date: 2018-03-20 03:00:20
Also in:
lkml
Il giorno 19 mar 2018, alle ore 14:28, Konstantin Khlebnikov =
[off-list ref] ha scritto:
=20 On 19.03.2018 09:03, Paolo Valente wrote:quoted
quoted
Il giorno 05 mar 2018, alle ore 04:48, Konstantin Khlebnikov =
[off-list ref] ha scritto:
quoted
quoted
=20 Rate should never overflow or become zero because it is used as =
divider.
quoted
quoted
This patch accumulates it with saturation. =20 Signed-off-by: Konstantin Khlebnikov <redacted> --- block/bfq-iosched.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) =20diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index aeca22d91101..a236c8d541b5 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c@@ -2546,7 +2546,8 @@ static void bfq_reset_rate_computation(struct =
bfq_data *bfqd,
quoted
quoted
=20 static void bfq_update_rate_reset(struct bfq_data *bfqd, struct =
request *rq)
quoted
quoted
{ - u32 rate, weight, divisor; + u32 weight, divisor; + u64 rate; =20 /* * For the convergence property to hold (see comments on@@ -2634,9 +2635,10 @@ static void bfq_update_rate_reset(struct =
bfq_data *bfqd, struct request *rq)
quoted
quoted
*/ bfqd->peak_rate *=3D divisor-1; bfqd->peak_rate /=3D divisor; - rate /=3D divisor; /* smoothing constant alpha =3D 1/divisor */ + do_div(rate, divisor); /* smoothing constant alpha =3D =
1/divisor */
quoted
quoted
=20 - bfqd->peak_rate +=3D rate; + /* rate should never overlow or become zero */It is bfqd->peak_rate that is used as a divider, and bfqd->peak_rate =
doesn't risk to be zero even if the variable 'rate' is zero here.
quoted
So I guess the reason why you consider the possibility that =
bfqd->peak_rate becomes zero is because of an overflow when summing = 'rate'. But, according to my calculations, this should be impossible = with devices with sensible speeds.
quoted
These are the reasons why I decided I could make it with a 32-bit =
variable, without any additional clamping. Did I make any mistake in my = evaluation?
=20 According to Murphy's law this is inevitable.. =20
Yep. Actually Murphy has been even clement this time, by making the failure occur to a kernel expert :)
I've seen couple division by zero crashes in bfq_wr_duration. Unfortunately logs weren't recorded. =20quoted
Anyway, even if I made some mistake about the maximum possible value =
of the device rate, and the latter may be too high for bfqd->peak_rate = to contain it, then I guess the right solution would not be to clamp the = actual rate to U32_MAX, but to move bfqd->peak_rate to 64 bits. Or am I = missing something else?
quoted
quoted
quoted
+ bfqd->peak_rate =3D clamp_t(u64, rate + bfqd->peak_rate, 1, =
U32_MAX);
=20 32-bit should be enough and better for division. My patch makes sure it never overflows/underflows. That's cheaper than full 64-bit/64-bit division. Anyway 64-bit speed could overflow too. =3D) =20
I see your point. Still, if the mistake is not in sizing, then you bumped into some odd bug. In this respect, I don't like much the idea of sweeping the dust under the carpet. So, let me ask you for a little bit more help. With your patch applied, and thus with no risk of crashes, what about adding, right before your clamp_t, something like: if (!bfqd->peak_rate) pr_crit(<dump of all the variables involved in updating = bfqd->peak_rate>); Once the failure shows up (Murphy permitting), we might have hints to the bug causing it. Apart from that, I have no problem with patches that make bfq more robust, even in a sort of black-box way. Thanks a lot, Paolo
=20quoted
quoted
update_thr_responsiveness_params(bfqd); =20 reset_computation: