Re: [PATCH] mm: memcontrol: prevent starvation when writing memory.high

From: Roman Gushchin <hidden>
Date: 2021-01-12 21:59:06
Also in: linux-mm, lkml

On Tue, Jan 12, 2021 at 02:45:43PM -0500, Johannes Weiner wrote:

On Tue, Jan 12, 2021 at 09:03:22AM -0800, Roman Gushchin wrote:

quoted

On Tue, Jan 12, 2021 at 11:30:11AM -0500, Johannes Weiner wrote:

quoted

When a value is written to a cgroup's memory.high control file, the
write() context first tries to reclaim the cgroup to size before
putting the limit in place for the workload. Concurrent charges from
the workload can keep such a write() looping in reclaim indefinitely.

In the past, a write to memory.high would first put the limit in place
for the workload, then do targeted reclaim until the new limit has
been met - similar to how we do it for memory.max. This wasn't prone
to the described starvation issue. However, this sequence could cause
excessive latencies in the workload, when allocating threads could be
put into long penalty sleeps on the sudden memory.high overage created
by the write(), before that had a chance to work it off.

Now that memory_high_write() performs reclaim before enforcing the new
limit, reflect that the cgroup may well fail to converge due to
concurrent workload activity. Bail out of the loop after a few tries.

Fixes: 536d3bf261a2 ("mm: memcontrol: avoid workload stalls when lowering memory.high")
Cc: <redacted> # 5.8+
Reported-by: Tejun Heo <redacted>
Signed-off-by: Johannes Weiner <redacted>
---
 mm/memcontrol.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 605f671203ef..63a8d47c1cd3 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c

@@ -6275,7 +6275,6 @@ static ssize_t memory_high_write(struct kernfs_open_file *of,
 
 	for (;;) {
 		unsigned long nr_pages = page_counter_read(&memcg->memory);
-		unsigned long reclaimed;
 
 		if (nr_pages <= high)
 			break;

@@ -6289,10 +6288,10 @@ static ssize_t memory_high_write(struct kernfs_open_file *of,
 			continue;
 		}
 
-		reclaimed = try_to_free_mem_cgroup_pages(memcg, nr_pages - high,
-							 GFP_KERNEL, true);
+		try_to_free_mem_cgroup_pages(memcg, nr_pages - high,
+					     GFP_KERNEL, true);
 
-		if (!reclaimed && !nr_retries--)
+		if (!nr_retries--)

Shouldn't it be (!reclaimed || !nr_retries) instead?

If reclaimed == 0, it probably doesn't make much sense to retry.

We usually allow nr_retries worth of no-progress reclaim cycles to
make up for intermittent reclaim failures.

The difference to OOMs/memory.max is that we don't want to loop
indefinitely on forward progress, but we should allow the usual number
of no-progress loops.

Re memory.max: trying really hard makes sense because we are OOMing otherwise.
With memory.high such an idea is questionable: if were not able to reclaim
a single page from the first attempt, it's unlikely that we can reclaim many
from repeating 16 times.

My concern here is that we can see CPU regressions in some cases when there is
no reclaimable memory. Do you think we can win something by trying harder?
If so, it's worth mentioning in the commit log. Because it's really a separate
change to what's described in the log, to some extent it's a move into an opposite
direction.

Thanks!

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help