Re: [PATCH] mm: memcontrol: fix network errors from failing __GFP_ATOMIC charges
From: Michal Hocko <mhocko@kernel.org>
Date: 2019-10-24 08:14:50
Also in:
cgroups, linux-mm, lkml
Subsystem:
control group - memory resource controller (memcg), memory management, the rest · Maintainers:
Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt, Andrew Morton, Linus Torvalds
On Wed 23-10-19 10:38:36, Shakeel Butt wrote:
On Wed, Oct 23, 2019 at 8:46 AM Johannes Weiner [off-list ref] wrote:quoted
On Wed, Oct 23, 2019 at 08:40:12AM +0200, Michal Hocko wrote:
[...]
quoted
quoted
On the other hand this would allow to break the isolation by an unpredictable amount. Should we put a simple cap on how much we can go over the limit. If the memcg limit reclaim is not able to keep up with those overflows then even __GFP_ATOMIC allocations have to fail. What do you think?I don't expect a big overrun in practice, and it appears that Google has been letting even NOWAIT allocations pass through without isolation issues.We have been overcharging for __GFP_HIGH allocations for couple of years and see no isolation issues in the production.quoted
Likewise, we have been force-charging the skmem for a while now and it hasn't been an issue for reclaim to keep up. My experience from production is that it's a whole lot easier to debug something like a memory.max overrun than it is to debug a machine that won't respond to networking. So that's the side I would err on.
It is definitely good to hear that your production systems are working well. I was not really worried about normal workloads but rather malicious kind of (work)loads where memcg is used to contain a potentially untrusted entities. That's where an unbounded atomic charges escapes would be a much bigger deal. Maybe this is not the case now because we do not have that many accounted __GFP_ATOMIC requests (I have tried to audit but gave up very shortly afterwards because there are not that many using __GFP_ACCOUNT directly so they are likely hidden behind SLAB_ACCOUNT). But I do not really like that uncertainty. If you have a really strong opinion on an explicit limit then I would like to see at least some warning to the kernel log so that we learn when some workloads hit a pathological paths that and act upon that. Does that sound like something you would agree to? E.g. something like
diff --git a/mm/page_counter.c b/mm/page_counter.c
index de31470655f6..e6999f6cf79e 100644
--- a/mm/page_counter.c
+++ b/mm/page_counter.c@@ -62,6 +62,8 @@ void page_counter_cancel(struct page_counter *counter, unsigned long nr_pages) WARN_ON_ONCE(new < 0); } +#define SAFE_OVERFLOW 1024 + /** * page_counter_charge - hierarchically charge pages * @counter: counter
@@ -82,8 +84,14 @@ void page_counter_charge(struct page_counter *counter, unsigned long nr_pages) * This is indeed racy, but we can live with some * inaccuracy in the watermark. */ - if (new > c->watermark) + if (new > c->watermark) { c->watermark = new; + if (new > c->max + SAFE_OVERFLOW) { + pr_warn("Max limit %lu breached, usage:%lu. Please report.\n", + c->max, atomic_long_read(&c->usage); + dump_stack(); + } + } } }
--
Michal Hocko
SUSE Labs