Thread (52 messages) 52 messages, 6 authors, 2020-02-27

Re: [PATCH v2 3/3] mm: memcontrol: recursive memory.low protection

From: Johannes Weiner <hidden>
Date: 2020-02-26 15:05:58
Also in: linux-mm, lkml

Hello,

On Wed, Feb 26, 2020 at 02:22:37PM +0100, Michal Koutný wrote:
On Tue, Feb 25, 2020 at 10:03:04AM -0500, Johannes Weiner [off-list ref] wrote:
quoted
Can you explain why you think protection is different from a weight?
- weights are dimension-less, they represent no real resource
They still ultimately translate to real resources. The concrete value
depends on what the parent's weight translates to, and it depends on
sibling configurations and their current consumption. (All of this is
already true for memory protection as well, btw). But eventually, a
weight specification translates to actual time on a CPU, bandwidth on
an IO device etc.
- sum of sibling weights is meaningless (and independent from parent
  weight)
Technically true for overcommitted memory.low values as well.
- to me this protection is closer to limits (actually I like your simile
  that they're very lazily enforced limits)
But weights are also lazily enforced limits. Without competition, you
can get 100% regardless of your weight; under contention, you get
throttled/limited back to an assigned share, however that's specified.

Once you peel away the superficial layer of how resources, or shares
of resources, are being referred to (time, bytes, relative shares)
weights/guarantees/protections are all the same thing: they are lazily
enforced* partitioning rules of a resource under contention.

I don't see a fundamental difference between them. And that in turn
makes it hard for me to accept that hierarchical inheritance rules
should be different.

* We also refer to this as work-conserving
quoted
Now you apply memory pressure, what happens?. D isn't reclaimed, C is
somewhat reclaimed, E is reclaimed hard. D will not page, C will page
a little bit, E will page hard *with the higher IO priority of B*.

Now C is stuck behind E. This is a priority inversion.
This is how I understand the weights to work.

    A				
    `- B		io.weight=200
       `- D		io.weight=100 (e.g.)
       `- E		io.weight=100 (e.g.)
    `- C		io.weight=50

Whatever weights I assign to D and E, when only E and C compete, E will
have higher weight (200 to 50, work-conservacy of weights).
Yes, exactly. I'm saying the same should be true for memory.
I don't think this inversion is wrong because E's work is still on
behalf of B.
"Wrong" isn't the right term. Is it what you wanted to express in your
configuration?

What's the point of designating E a memory donor group that needs to
relinquish memory to C under pressure, but when it actually gives up
that memory it beats C in competition over a different resource?

You are talking about a mathematical truth on a per-controller
basis. What I'm saying is that I don't see how this is useful for real
workloads, their relative priorities, and the performance expectations
users have from these priorities.

With a priority inversion like this, there is no actual performance
isolation or containerization going on here - which is the whole point
of cgroups and resource control.
Or did you mean that if protections were transformed (via effective
calculation) to have ratios only in the same range as io.weights
(1e-4..1e4 instead of 0..inf), then it'd prevent the inversion? (By
setting D,E weights in same ratios as D,E protections.)
No, the inversion would be prevented if E could consume all resources
assigned to B that aren't consumed by D.

This is true for IO and CPU, but before my patch not for memory.
quoted
1. Can you please make a practical use case for having scape goats or
   donor groups to justify retaining what I consider to be an
   unimportant artifact in the memory.low semantics?
    A.low=10G
    `- B.low=X   u=6G
    `- C.low=X   u=4G
    `- D.low=0G  u=5G

B,C   run the workload which should be protected
D     runs job that doesn't need any protection 
u     denotes usage
(I made the example with more than one important sibling to illustrate
usefulness of some implicit distribution X.)

When outer reclaim comes, reclaiming from B,C would be detrimental to
their performance, while impact on D is unimportant. (And induced IO
load on the rest (out of A) too.)
Okay, but this is a different usecase than we were talking about.

My objection is to opting out of protection against cousins (thus
overriding parental resource assignment), not against siblings.

Expressing priorities between siblings like this is fine. And I
absolutely see practical value in your specific example.
It's not possible to move D to the A's level, since only A is all what a
given user can control.
Correct, but you can change the tree to this:

     A.low=10G
     `- A1.low=10G
        `- B.low=0G
        `- C.low=0G
     `- D.low=0G

to express

A1 > D
 B = C

That priority order can be matched by CPU and IO controls as well:

     A.weight=100
     `- A1.weight=100
        `- B.weight=100
        `- C.weight=100
     `- D.weight=1

My objection is purely about opting out of resources relative to (and
assuming a lower memory priority than) an outside cousin that may have
a lower priority on other resources.

That is, I would like to see an argument for this setup:

     A				
     `- B		io.weight=200          memory.low=10G
        `- D		io.weight=100 (e.g.)   memory.low=10G
        `- E		io.weight=100 (e.g.)   memory.low=0
     `- C		io.weight=50           memory.low=5G

Where E has no memory protection against C, but E has IO priority over
C. That's the configuration that cannot be expressed with a recursive
memory.low, but since it involves priority inversions it's not useful
to actually isolate and containerize workloads.

That's why I'm saying it's an artifact, not an actual feature.
quoted
2. If you think opting out of hierarchically assigned resources is a
   fundamentally important usecase, can you please either make an
   argument why it should also apply to CPU and IO, or alternatively
   explain in detail why they are meaningfully different?
I'd say that protected memory is a disposable resource in contrast with
CPU/IO. If you don't have latter, you don't progress; if you lack the
former, you are refaulting but can make progress. Even more, you should
be able to give up memory.min.
Eh, I'm not buying that. You cannot run without memory either. If
somebody reclaims a page between you faulting it in and you resuming
to userspace, there is no forward progress.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help