Re: [RFC 0/7] Introduce memory allocation speed throttle in memcg

From: yulei zhang <hidden>
Date: 2021-06-03 10:19:32
Also in: linux-mm

On Wed, Jun 2, 2021 at 11:39 PM Shakeel Butt [off-list ref] wrote:

On Wed, Jun 2, 2021 at 2:11 AM yulei zhang [off-list ref] wrote:

quoted

On Tue, Jun 1, 2021 at 10:45 PM Chris Down [off-list ref] wrote:

quoted

yulei zhang writes:

quoted

Yep, dynamically adjust the memory.high limits can ease the memory pressure
and postpone the global reclaim, but it can easily trigger the oom in
the cgroups,

To go further on Shakeel's point, which I agree with, memory.high should
_never_ result in memcg OOM. Even if the limit is breached dramatically, we
don't OOM the cgroup. If you have a demonstration of memory.high resulting in
cgroup-level OOM kills in recent kernels, then that needs to be provided. :-)

You are right, I mistook it for max. Shakeel means the throttling
during context switch
which uses memory.high as threshold to calculate the sleep time.
Currently it only applies
to cgroupv2.  In this patchset we explore another idea to throttle the
memory usage, which
rely on setting an average allocation speed in memcg. We hope to
suppress the memory
usage in low priority cgroups when it reaches the system watermark and
still keep the activities
alive.

I think you need to make the case: why should we add one more form of
throttling? Basically why memory.high is not good for your use-case
and the proposed solution works better. Though IMO it would be a hard
sell.

Thanks. IMHO, there are differences between these two throttlings.
memory.high is a per-memcg throttle which targets to limit the memory
usage of the tasks in the cgroup. For the memory allocation speed throttle(MST),
the purpose is to avoid the memory burst in cgroup which would trigger
the global reclaim and affects the timing sensitive workloads in other cgroup.
For example, we have two pods with memory overcommit enabled, one includes
online tasks and the other has offline tasks, if we restrict the memory usage of
the offline pod with memory.high, it will lose the benefit of memory overcommit
when the other workloads are idle. On the other hand, if we don't
limit the memory
usage, it will easily break the system watermark when there suddenly has massive
memory operations. If enable MST in this case, we will be able to
avoid the direct
reclaim and leverage the overcommit.
.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help