Re: [RESEND v12 0/6] cgroup-aware OOM killer

From: Michal Hocko <mhocko@kernel.org>
Date: 2017-11-01 07:38:04
Also in: linux-mm, lkml

On Tue 31-10-17 15:21:23, David Rientjes wrote:

On Tue, 31 Oct 2017, Michal Hocko wrote:

quoted

I'm not ignoring them, I have stated that we need the ability to protect 
important cgroups on the system without oom disabling all attached 
processes.  If that is implemented as a memory.oom_score_adj with the same 
semantics as /proc/pid/oom_score_adj, i.e. a proportion of available 
memory (the limit), it can also address the issues pointed out with the 
hierarchical approach in v8.

No it cannot and it would be a terrible interface to have as well. You
do not want to permanently tune oom_score_adj to compensate for
structural restrictions on the hierarchy.

memory.oom_score_adj would never need to be permanently tuned, just as 
/proc/pid/oom_score_adj need never be permanently tuned.  My response was 
an answer to Roman's concern that "v8 has it's own limitations," but I 
haven't seen a concrete example where the oom killer is forced to kill 
from the non-preferred cgroup while the user has power of biasing against 
certain cgroups with memory.oom_score_adj.  Do you have such a concrete 
example that we can work with?

Yes, the one with structural requirements due to other controllers or
due to general organizational purposes where hierarchical (sibling
oriented) comparison just doesn't work. Take the students, teachers,
admins example. You definitely do not want to kill from students
subgroups by default just because this is the largest entity type.
Tuning memory.oom_score_adj doesn't work for that usecase as soon as
new subgroups come and go.

quoted

I believe, and Roman has pointed that out as well already, that further
improvements can be implemented without changing user visible behavior
as and add-on. If you disagree then you better come with a solid proof
that all of us wrong and reasonable semantic cannot be achieved that
way.

We simply cannot determine if improvements can be implemented in the 
future without user-visible changes if those improvements are unknown or 
undecided at this time.

Come on. There have been at least two examples on how this could be
achieved. One priority based which would use cumulative memory
consumption if set on intermediate nodes which would allow you to
compare siblings. And another one was to add a new knob which would make
an intermediate node an aggregate for accounting purposes.

It may require hierarchical accounting when 
making a choice between siblings, as suggested with oom_score_adj.  The 
only thing that we need to agree on is that userspace needs to have some 
kind of influence over victim selection: the oom killer killing an 
important user process is an extremely sensitive thing.

And I am pretty sure we have already agreed that something like this is
useful for some usecases and nobody objected this would get merged in
future. All we are saying now is that this is not in scope of _this_
patchseries because the vast majority of usecases simply do not care
about influencing the oom selection. They only do care about having per
cgroup behavior and/or kill all semantic. I really do not understand
what is hard about that.

If the patchset 
lacks the ability to have that influence, and such an ability would impact 
the heuristic overall, it's better to introduce that together as a 
complete patchset rather than merging an incomplete feature when it's 
known the user needs some control, asking the user to workaround it by 
setting all processes to oom disabled in a preferred mem cgroup, and then 
changing the heuristic again.

I believe we can introduce new knobs without influencing those who do
not set them and I haven't heard any argument which would say otherwise.

-- 
Michal Hocko
SUSE Labs

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help