Thread (17 messages) 17 messages, 3 authors, 2021-11-19

Re: [PATCH v3 2/4] mm/oom: handle remote ooms

From: Shakeel Butt <hidden>
Date: 2021-11-16 02:01:35
Also in: cgroups, linux-fsdevel

On Mon, Nov 15, 2021 at 2:58 AM Michal Hocko [off-list ref] wrote:
[...]
quoted
The behavior I saw returning ENOMEM for this edge case was that the
code was forever looping the pagefault, and I was (seemingly
incorrectly) under the impression that a suggestion to forever loop
the pagefault would be completely fundamentally unacceptable.
Well, I have to say I am not entirely sure what is the best way to
handle this situation. Another option would be to treat this similar to
ENOSPACE situation. This would result into SIGBUS IIRC.

The main problem with OOM killer is that it will not resolve the
underlying problem in most situations. Shmem files would likely stay
laying around and their charge along with them.
This and similar topics were discussed during LSFMM 2019
(https://lwn.net/Articles/787626/).
Killing the allocating
task has problems on its own because this could be just a DoS vector by
other unrelated tasks sharing the shmem mount point without a gracefull
fallback. Retrying the page fault is hard to detect. SIGBUS might be
something that helps with the latest. The question is how to communicate
this requerement down to the memcg code to know that the memory reclaim
should happen (Should it? How hard we should try?) but do not invoke the
oom killer. The more I think about this the nastier this is.
--
IMHO we should punt the resolution to the userspace and keep the
kernel simple. This is an opt-in feature and the user is expected to
know and handle exceptional scenarios. The kernel just needs to tell
the userspace that this exceptional situation is happening somehow.

How about for remote ooms irrespective of page fault path or not, keep
the allocator looping but keep incrementing a new memcg event
MEMCG_OOM_NO_VICTIM? The userspace will get to know the situation
either through inotify or polling and can handle the situation by
either increasing the limit or by releasing the memory of the
monitored memcg.

thanks,
Shakeel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help