Thread (17 messages) 17 messages, 8 authors, 2022-02-12

Re: [PATCH] mm/vmscan: add sysctl knobs for protecting the working set

From: Barry Song <hidden>
Date: 2021-12-13 09:06:55
Also in: linux-fsdevel, linux-mm, lkml

On Mon, Dec 13, 2021 at 10:23 AM Alexey Avramov [off-list ref] wrote:
quoted
I don't think that the limits should be "N bytes on the current node".
It's not a problem to add a _ratio knobs. How the tunables should look and
what their default values should be can still be discussed. Now my task is
to prove that the problem exists and the solution I have proposed is
effective and correct.
quoted
the various zones have different size as well.
I'll just point out the precedent: sc->file_is_tiny works the same way
(per node) as suggested sc->clean_below_min etc.
quoted
We do already have a lot of sysctls for controlling these sort of
things.
There are many of them, but there are no most important ones for solving
the problem - those that are proposed in the patch.
quoted
Was much work put into attempting to utilize the existing
sysctls to overcome these issues?
Oh yes! This is all I have been doing for the last 4 years. At the end of
2017, I was forced to write my own userspace OOM killer [1] to resist
freezes (I didn't know then that earlyoom already existed).
I'd like to understand the problem of the existing sysctls.  For example,
if we want to keep more free memory, the min free kbytes can help. On
the other hand, if we want to keep more file-backed memory,  a big
swappiness will help.
I believe you have tried all of the above and they have all failed to satisfy
your use cases, but I really expect a more detailed explanation why they
don't work.
In 2018, Facebook came on the scene with its oomd [2]:
quoted
The traditional Linux OOM killer works fine in some cases, but in others
it kicks in too late, resulting in the system entering a livelock for an
indeterminate period.
Here we can assume that Facebook's engineers haven't found the kernel
sysctl tunables that would satisfy them.

In 2019 LKML people could not offer Artem S. Tashkinov a simple solution to
the problem he described [3]. In addition to discussing user-space
solutions, 2 kernel-side solutions are proposed:

- PSI-based solution was proposed by Johannes Weiner [4].
- Reserve a fixed (configurable) amount of RAM for caches, and trigger OOM
  killer earlier, before most UI code is evicted from memory was suggested
  by ndrw [5]. This is what I propose to accept in the mainline. It is the
  right way to go.
isn't this something like setting a bigger min_free_kbytes?
None of the suggestions posted in that thread were accepted in the
mainline.

In 2019, at the same time, Fedora Workstation group discussed [6]
Issue #98 Better interactivity in low-memory situations.
As a result, it was decided to enable earlyoom by default for Fedora
Workstation 32. No existing sysctl was found to be of much help.
It was also suggested to use a swap on zram and to enable the cgroup-based
uresourced daemon to protect the user session.

So, the problem described by Artem S. Tashkinov in 2019 is still easily
reproduced in 2021. The assurances of the maintainers that they consider
the thrashing and near-OOM stalls to be a serious problems are difficult to
take seriously while they ignore the obvious solution: if reclaiming file
caches leads to thrashing, then you just need to prohibit deleting the file
cache. And allow the user to control its minimum amount.
By the way, the implementation of such an idea has been known [7] since
2010 and was even used in Chrome OS.

Bonus: demo: https://youtu.be/ZrLqUWRodh4
Debian 11 on VM, Linux 5.14 with the patch, no swap space,
playing SuperTux while 1000 `tail /dev/zero` started simultaneously:
1. No freezes with vm.clean_min_kbytes=300000, I/O pressure was closed to
   zero, memory pressure was moderate (70-80 some, 12-17 full), all tail
   processes has been killed in 2 minutes (0:06 - 2:14), it's about
   8 processes reaped by oom_reaper per second;
2. Complete UI freeze without the working set protection (since 3:40).
I do agree we need some way to stop the thrashing of memory especially when
free memory is low and we are very close to OOM.
Mainly you are mentioning the benefit of keeping shared libraries, so
what is the
purpose of vm.anon_min_kbytes?
And will switching multiple applications under the low memory
situation still trigger
thrashing of memory, for example, a library kicks another library out?
anon pages
of one application kick  anon pages of another application out?
[1] https://github.com/hakavlad/nohang
[2] https://engineering.fb.com/2018/07/19/production-engineering/oomd/
[3] https://lore.kernel.org/lkml/d9802b6a-949b-b327-c4a6-3dbca485ec20@gmx.com/ (local)
[4] https://lore.kernel.org/lkml/20190807205138.GA24222@cmpxchg.org/ (local)
[5] https://lore.kernel.org/lkml/806F5696-A8D6-481D-A82F-49DEC1F2B035@redhazel.co.uk/ (local)
[6] https://pagure.io/fedora-workstation/issue/98
[7] https://lore.kernel.org/lkml/20101028191523.GA14972@google.com/ (local)
Thanks
Barry
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help