Re: [PATCH] backing_dev_info: introduce min_bw/max_bw limits
From: Hillf Danton <hidden>
Date: 2021-06-18 09:13:01
On Fri, 18 Jun 2021 10:31:35 +0200 Michael Stapelberg wrote:
Hey Miklos Thanks for taking a look! On Fri, 18 Jun 2021 at 10:18, Miklos Szeredi [off-list ref] wrote:quoted
On Thu, 17 Jun 2021 at 11:53, Michael Stapelberg [off-list ref] wrote:quoted
These new knobs allow e.g. FUSE file systems to guide kernel memory writeback bandwidth throttling. Background: When using mmap(2) to read/write files, the page-writeback code tries t=oquoted
quoted
measure how quick file system backing devices (BDI) are able to write d=ata,quoted
quoted
so that it can throttle processes accordingly. Unfortunately, certain usage patterns, such as linkers (tested with GCC=,quoted
quoted
but also the Go linker) seem to hit an unfortunate corner case when wri=tingquoted
quoted
their large executable output files: the kernel only ever measures the (non-representative) rising slope of the starting bulk write, but t=hequoted
quoted
whole file write is already over before the kernel could possibly measu=requoted
quoted
the representative steady-state. As a consequence, with each program invocation hitting this corner case=,quoted
quoted
the FUSE write bandwidth steadily sinks in a downward spiral, until it eventually reaches 0 (!). This results in the kernel heavily throttling page dirtying in programs trying to write to FUSE, which in turn manife=stsquoted
quoted
itself in slow or even entirely stalled linker processes. Change: This commit adds 2 knobs which allow avoiding this situation entirely o=n aquoted
quoted
per-file-system basis by restricting the minimum/maximum bandwidth.This looks like a bug in the dirty throttling heuristics, that may be effecting multiple fuse based filesystems. Ideally the solution should be a fix to those heuristics, not adding more=knobs. Agreed.
+1
quoted
Is there a fundamental reason why that can't be done? Maybe the heuristics need to detect the fact that steady state has not been reached, and not modify the bandwidth in that case, or something along those lines.Maybe, but I don=E2=80=99t have the expertise, motivation or time to investigate this any further, let alone commit to get it done. During our previous discussion I got the impression that nobody else had any cycles for this either: https://lore.kernel.org/linux-fsdevel/CANnVG6n=3DySfe1gOr=3D0ituQidp56idGAR= DKHzP0hv=3DERedeMrMA@mail.gmail.com/
Its timestamp is Mon, 9 Mar 2020 16:11:41 +0100
Have you had a look at the China LSF report at http://bardofschool.blogspot.com/2011/? The author of the heuristic has spent significant effort and time coming up with what we currently have in the kernel: """ Fengguang said he draw more than 10K performance graphs and read even more in the past year. """ This implies that making changes to the heuristic will not be a quick fix.
The 2019 attempt [01] IIRC was trying to cut the heuristics.
I think adding these limit knobs could be useful regardless of the specific heuristic behavior. The knobs are certainly easy to understand, safe to introduce (no regressio= ns), and can be used to fix the issue at hand as well as other issues (if any, now or in the future). Thanks Best regards Michael
[01] https://lore.kernel.org/lkml/20191118082559.GJ6910@shao2-debian/ (local)