Re: [patch] mm: memcontrol: support transparent huge pages under pressure
From: Johannes Weiner <hannes@cmpxchg.org>
Date: 2014-09-23 11:17:13
Also in:
linux-mm, lkml
Subsystem:
control group - memory resource controller (memcg), memory management, the rest · Maintainers:
Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt, Andrew Morton, Linus Torvalds
On Mon, Sep 22, 2014 at 10:52:50PM -0700, Greg Thelen wrote:
On Fri, Sep 19 2014, Johannes Weiner wrote:quoted
In a memcg with even just moderate cache pressure, success rates for transparent huge page allocations drop to zero, wasting a lot of effort that the allocator puts into assembling these pages. The reason for this is that the memcg reclaim code was never designed for higher-order charges. It reclaims in small batches until there is room for at least one page. Huge pages charges only succeed when these batches add up over a series of huge faults, which is unlikely under any significant load involving order-0 allocations in the group. Remove that loop on the memcg side in favor of passing the actual reclaim goal to direct reclaim, which is already set up and optimized to meet higher-order goals efficiently. This brings memcg's THP policy in line with the system policy: if the allocator painstakingly assembles a hugepage, memcg will at least make an honest effort to charge it. As a result, transparent hugepage allocation rates amid cache activity are drastically improved: vanilla patched pgalloc 4717530.80 ( +0.00%) 4451376.40 ( -5.64%) pgfault 491370.60 ( +0.00%) 225477.40 ( -54.11%) pgmajfault 2.00 ( +0.00%) 1.80 ( -6.67%) thp_fault_alloc 0.00 ( +0.00%) 531.60 (+100.00%) thp_fault_fallback 749.00 ( +0.00%) 217.40 ( -70.88%) [ Note: this may in turn increase memory consumption from internal fragmentation, which is an inherent risk of transparent hugepages. Some setups may have to adjust the memcg limits accordingly to accomodate this - or, if the machine is already packed to capacity, disable the transparent huge page feature. ]We're using an earlier version of this patch, so I approve of the general direction. But I have some feedback. The memsw aspect of this change seems somewhat separate. Can it be split into a different patch? The memsw aspect of this patch seems to change behavior. Is this intended? If so, a mention of it in the commit log would assuage the reader. I'll explain... Assume a machine with swap enabled and res.limit==memsw.limit, thus memsw_is_minimum is true. My understanding is that memsw.usage represents sum(ram_usage, swap_usage). So when memsw_is_minimum=true, then both swap_usage=0 and memsw.usage==res.usage. In this condition, if res usage is at limit then there's no point in swapping because memsw.usage is already maximal. Prior to this patch I think the kernel did the right thing, but not afterwards. Before this patch: if res.usage == res.limit, try_charge() indirectly calls try_to_free_mem_cgroup_pages(noswap=true) After this patch: if res.usage == res.limit, try_charge() calls try_to_free_mem_cgroup_pages(may_swap=true) Notice the inverted swap-is-allowed value.
For some reason I had myself convinced that this is dead code due to a change in callsites a long time ago, but you are right that currently try_charge() relies on it, thanks for pointing it out. However, memsw is always equal to or bigger than the memory limit - so instead of keeping a separate state variable to track when memory failure implies memsw failure, couldn't we just charge memsw first? How about the following? But yeah, I'd split this into a separate patch now. --- mm/memcontrol.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e2def11f1ec1..7c9a8971d0f4 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c@@ -2497,16 +2497,17 @@ retry: goto done; size = batch * PAGE_SIZE; - if (!res_counter_charge(&memcg->res, size, &fail_res)) { - if (!do_swap_account) + if (!do_swap_account || + !res_counter_charge(&memcg->memsw, size, &fail_res)) { + if (!res_counter_charge(&memcg->res, size, &fail_res)) goto done_restock; - if (!res_counter_charge(&memcg->memsw, size, &fail_res)) - goto done_restock; - res_counter_uncharge(&memcg->res, size); + if (do_swap_account) + res_counter_uncharge(&memcg->memsw, size); + mem_over_limit = mem_cgroup_from_res_counter(fail_res, res); + } else { mem_over_limit = mem_cgroup_from_res_counter(fail_res, memsw); may_swap = false; - } else - mem_over_limit = mem_cgroup_from_res_counter(fail_res, res); + } if (batch > nr_pages) { batch = nr_pages;
--
2.1.0
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>