Thread (8 messages) 8 messages, 4 authors, 2021-06-22

Re: [PATCH] backing_dev_info: introduce min_bw/max_bw limits

From: Hillf Danton <hidden>
Date: 2021-06-18 09:13:01

On Fri, 18 Jun 2021 10:31:35 +0200 Michael Stapelberg wrote:
Hey Miklos

Thanks for taking a look!

On Fri, 18 Jun 2021 at 10:18, Miklos Szeredi [off-list ref] wrote:
quoted
On Thu, 17 Jun 2021 at 11:53, Michael Stapelberg
[off-list ref] wrote:
quoted
These new knobs allow e.g. FUSE file systems to guide kernel memory
writeback bandwidth throttling.

Background:

When using mmap(2) to read/write files, the page-writeback code tries t=
o
quoted
quoted
measure how quick file system backing devices (BDI) are able to write d=
ata,
quoted
quoted
so that it can throttle processes accordingly.

Unfortunately, certain usage patterns, such as linkers (tested with GCC=
,
quoted
quoted
but also the Go linker) seem to hit an unfortunate corner case when wri=
ting
quoted
quoted
their large executable output files: the kernel only ever measures
the (non-representative) rising slope of the starting bulk write, but t=
he
quoted
quoted
whole file write is already over before the kernel could possibly measu=
re
quoted
quoted
the representative steady-state.

As a consequence, with each program invocation hitting this corner case=
,
quoted
quoted
the FUSE write bandwidth steadily sinks in a downward spiral, until it
eventually reaches 0 (!). This results in the kernel heavily throttling
page dirtying in programs trying to write to FUSE, which in turn manife=
sts
quoted
quoted
itself in slow or even entirely stalled linker processes.

Change:

This commit adds 2 knobs which allow avoiding this situation entirely o=
n a
quoted
quoted
per-file-system basis by restricting the minimum/maximum bandwidth.

This looks like  a bug in the dirty throttling heuristics, that may be
effecting multiple fuse based filesystems.

Ideally the solution should be a fix to those heuristics, not adding more=
knobs.


Agreed.
+1
quoted

Is there a fundamental reason why that can't be done?    Maybe the
heuristics need to detect the fact that steady state has not been
reached, and not modify the bandwidth in that case, or something along
those lines.
Maybe, but I don=E2=80=99t have the expertise, motivation or time to
investigate this any further, let alone commit to get it done.
During our previous discussion I got the impression that nobody else
had any cycles for this either:
https://lore.kernel.org/linux-fsdevel/CANnVG6n=3DySfe1gOr=3D0ituQidp56idGAR=
DKHzP0hv=3DERedeMrMA@mail.gmail.com/
Its timestamp is Mon, 9 Mar 2020 16:11:41 +0100
Have you had a look at the China LSF report at
http://bardofschool.blogspot.com/2011/?
The author of the heuristic has spent significant effort and time
coming up with what we currently have in the kernel:

"""
Fengguang said he draw more than 10K performance graphs and read even
more in the past year.
"""

This implies that making changes to the heuristic will not be a quick fix.
The 2019 attempt [01] IIRC was trying to cut the heuristics.
I think adding these limit knobs could be useful regardless of the
specific heuristic behavior.
The knobs are certainly easy to understand, safe to introduce (no regressio=
ns),
and can be used to fix the issue at hand as well as other issues (if
any, now or in the future).

Thanks
Best regards
Michael
[01] https://lore.kernel.org/lkml/20191118082559.GJ6910@shao2-debian/ (local)
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help