Thread (52 messages) 52 messages, 6 authors, 2020-02-27

Re: [PATCH v2 3/3] mm: memcontrol: recursive memory.low protection

From: Michal Koutný <hidden>
Date: 2020-02-27 13:35:53
Also in: linux-mm, lkml

TL;DR I see merit in the recursive propagation if it's requested
explicitly (i.e. retaining meaining of 0). The protection/weight
semantics should be refined.

On Wed, Feb 26, 2020 at 10:05:48AM -0500, Johannes Weiner [off-list ref] wrote:
They still ultimately translate to real resources. The concrete value
depends on what the parent's weight translates to, and it depends on
sibling configurations and their current consumption. (All of this is
already true for memory protection as well, btw). But eventually, a
weight specification translates to actual time on a CPU, bandwidth on
an IO device etc.
quoted
- sum of sibling weights is meaningless (and independent from parent
  weight)
Technically true for overcommitted memory.low values as well.
Yes, but for overcommited only. For pure weights it doesn't matter if
you set 1:10, 10:100 or 100:1000, however, for the protection it has
this behavior only when approaching infinity and as the sum compares to
parent's value, the protection behaves differently.

[If there had to be to some pure memory weights, those would for
instance express relative affinity of group's pages to physical memory.]
I don't see a fundamental difference between them. And that in turn
makes it hard for me to accept that hierarchical inheritance rules
should be different.
I'll try coming up with some better examples for the difference that I
perceive.
"Wrong" isn't the right term. Is it what you wanted to express in your
configuration?
I want to express absolute amount of memory (ideally representing
workingset size) under protection.

IIUC, you want to express general relative priorities of B vs C when
some outer metric has to be maintained given you reach both limits of
memory and IO.
You are talking about a mathematical truth on a per-controller
basis. What I'm saying is that I don't see how this is useful for real
workloads, their relative priorities, and the performance expectations
users have from these priorities.
 
With a priority inversion like this, there is no actual performance
isolation or containerization going on here - which is the whole point
of cgroups and resource control.
I acknowledge that by pressing too much along one dimension (memory) you
induce expansion in other dimension (IO) and that may become noticable in
siblings (expansion over saturation [1]). But that's expected when only
weights are in use. If you wanted to hide the effect of workload B to C,
B would need real limit.

[I beg to disagree that containerization is whole point of cgroups, it's
large part of it, hence the isolation needn't be necessarily
bi-directional.]
My objection is to opting out of protection against cousins (thus
overriding parental resource assignment), not against siblings.
Just to sync up the terminology - I'm calling this protection against
uncles (the composition/structure under them is irrelevant).
And the limitation comes from grandparent or higher (or global).

...and the overriden parental resource assignment is the expansion on
non-memory dimension (IO/CPU).
Correct, but you can change the tree to this:

     A.low=10G
     `- A1.low=10G
        `- B.low=0G
        `- C.low=0G
     `- D.low=0G

to express

A1 > D
 B = C
That sort of works (if I give up the scapegoat). Although I have trouble
that I have to copy the value from A to A1, I could have done that with
previous hierarchy and simply set B.low=C.low=10G.
That is, I would like to see an argument for this setup:

     A				
     `- B		io.weight=200          memory.low=10G
        `- D		io.weight=100 (e.g.)   memory.low=10G
        `- E		io.weight=100 (e.g.)   memory.low=0
     `- C		io.weight=50           memory.low=5G

Where E has no memory protection against C, but E has IO priority over
C. That's the configuration that cannot be expressed with a recursive
memory.low, but since it involves priority inversions it's not useful
to actually isolate and containerize workloads.
But there can be no cousin (uncle) or more precisely it's the global
rest that we don't mind to affect.
quoted
I'd say that protected memory is a disposable resource in contrast with
CPU/IO. If you don't have latter, you don't progress; if you lack the
former, you are refaulting but can make progress. Even more, you should
be able to give up memory.min.
Eh, I'm not buying that. You cannot run without memory either. If
somebody reclaims a page between you faulting it in and you resuming
to userspace, there is no forward progress.
I made a hasty argument (misinterpretting the constant outer reclaim
pressure). So that wasn't the fundamental difference.

The second part -- memory.min is subject to equal calculation as
memory.low. Do you find the scape goat preventing OOM in grand-parent
(or higher) subtree also a misfeature/artifact?

Thanks,
Michal

[1] Here I'm taking your/Tejun's assumption that in the stressful
situations it always boils down to IO, although I don't have any
quantitative arguments for that.

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help