Thread (52 messages) 52 messages, 6 authors, 2020-02-27

Re: [PATCH v2 3/3] mm: memcontrol: recursive memory.low protection

From: Michal Koutný <hidden>
Date: 2020-02-25 13:37:30
Also in: linux-mm, lkml

Hello.

On Fri, Feb 21, 2020 at 01:58:39PM -0500, Johannes Weiner [off-list ref] wrote:
When you set task's and logger's memory.low to "max" or 10G or any
bogus number like this, a limit reclaim in job treats this as origin
protection and tries hard to avoid reclaiming anything in either of
the two cgroups.
What do you mean by origin protection? (I'm starting to see some
misunderstanding here, c.f. my remark regarding the parent==root
condition in the other patch [1]).
memory.events::low skyrockets even though no intended
protection was violated, we'll have reclaim latencies (especially when
there are a few dying cgroups accumluated in subtree).
Hopefully, I see where are you coming from. There would be no (false)
low notifications if the elow was calculated all they way top-down from
the real root. Would such calculation be the way to go?
that job can't possibly *know* about the top-level host
protection that lies beyond the delegation point and outside its own
namespace,
Yes, I agree.
and that it needs to propagate protection against rpm upgrades into
its own leaf groups for each tasklet and component.
If a job wants to use concrete protection than it sets it, if it wants
to use protection from above, then it can express it with the infinity
(after changing the effective calculation I described above).

Now, you may argue that the infinity would be nonsensical if it's not a
subordinate job. Simplest approach would be likely to introduce the
special "inherit" value (such a literal name may be misleading as it
would be also "dont-care").
Again, in practice we have found this to be totally unmanageable and
routinely first forgot and then had trouble hacking the propagation
into random jobs that create their own groups.
I've been bitten by this as well. However, the protection defaults to
off and I find it this matches the general rule that kernel provides the
mechanism and user(space) the policy.
And when you add new hardware configurations, you cannot just make a
top-level change in the host config, you have to update all the job
specs of workloads running in the fleet.
(I acknowledge the current mechanism lacks an explicit way to express
the inherit/dont-care value.)

My patch brings memory configuration in line with other cgroup2
controllers.
Other controllers mostly provide the limit or weight controls, I'd say
protection semantics is specific only to the memory controller so
far [2]. I don't think (at least by now) it can be aligned as the weight
or limit semantics.
I've made the case why it's not a supported usecase, and why it is a
meaningless configuration in practice due to the way other controllers
already behave.
I see how your reasoning works for limits (you set memory limit and you
need to control io/cpu too to maintain intended isolation). I'm confused
why having a scapegoat (or donor) sibling for protection should not be
supported or how it breaks protection for others if not combined with
io/cpu controllers. Feel free to point me to the message if I overlooked
it.
I think at this point in the discussion, the only thing I can do is
remind you 
I think there is different perception on both sides because of unclear
definitions, so I seek to clarify that.
that the behavior I'm introducing is gated behind a mount
option that nobody is forced to enable if they insist on disagreeing
against all evidence to the contrary.
Sure, but I think it's better to (try reaching|reach) a consensus than
to introduce split behavior (maintenance, debugging).

Thanks,
Michal

[1] https://lore.kernel.org/lkml/20200221171024.GA23476-9OudH3eul5jcvrawFnH+a6VXKuFTiq87@public.gmane.org/
[2] I admit I didn't look into io.latency that much and IIRC
    cpu.uclamp.{min,max} don't have the overcommit issue.

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help