Re: [PATCH v2 2/3] mm: Force update of mem cgroup soft limit tree on usage excess
From: Tim Chen <hidden>
Date: 2021-02-25 22:50:58
Also in:
linux-mm, lkml
On 2/24/21 3:53 AM, Michal Hocko wrote:
On Mon 22-02-21 11:48:37, Tim Chen wrote:quoted
On 2/22/21 11:09 AM, Michal Hocko wrote:quoted
quoted
I actually have tried adjusting the threshold but found that it doesn't work well for the case with unenven memory access frequency between cgroups. The soft limit for the low memory event cgroup could creep up quite a lot, exceeding the soft limit by hundreds of MB, even if I drop the SOFTLIMIT_EVENTS_TARGET from 1024 to something like 8.What was the underlying reason? Higher order allocations?Not high order allocation. The reason was because the run away memcg asks for memory much less often, compared to the other memcgs in the system. So it escapes the sampling update and was not put onto the tree and exceeds the soft limit pretty badly. Even if it was put onto the tree and gets page reclaimed below the limit, it could escape the sampling the next time it exceeds the soft limit.I am sorry but I really do not follow. Maybe I am missing something obvious but the the rate of events (charge/uncharge) shouldn't be really important. There is no way to exceed the limit without charging memory (either a new or via task migration in v1 and immigrate_on_move). If you have SOFTLIMIT_EVENTS_TARGET 8 then you should be 128 * 8 events to re-evaluate. Huge pages can make the runaway much bigger but how it would be possible to runaway outside of that bound.
Michal,
Let's take an extreme case where memcg 1 always generate the
first event and memcg 2 generates the rest of 128*8-1 events
and the pattern repeat. The update tree happens on the 128*8th event
so memcg 1 did not trigger update tree. In this case we will
keep missing memcg 1's event and not put memcg 1 on the tree.
Something like this pattern of memory events
cg1 cg2 cg2 cg2 ....cg2 cg1 cg2 cg2 cg2....cg2 cg1 cg2 .....
^ ^
update tree update tree
Of course in real life the update events are random in nature.
However, due to the low occurrence of memcg 1 event, we can miss
updating it for a long time due to its lower probability of occurrence.
Btw. do we really need SOFTLIMIT_EVENTS_TARGET at all? Why cannot we just stick with a single threshold? mem_cgroup_update_tree can be made a effectivelly a noop when there is no soft limit in place so overhead shouldn't matter for the vast majority of workloads.
I think there are two limits because the original code wants memc_cgroup_threshold to be updated more frequently than the soft_limit_tree. The soft limit tree update is more costly. Tim