Re: [PATCH] memcg: charge before adding to swapcache on swapin
From: Johannes Weiner <hidden>
Date: 2021-02-20 00:36:12
Also in:
linux-mm, lkml
On Fri, Feb 19, 2021 at 02:44:05PM -0800, Shakeel Butt wrote:
Currently the kernel adds the page, allocated for swapin, to the swapcache before charging the page. This is fine but now we want a per-memcg swapcache stat which is essential for folks who wants to transparently migrate from cgroup v1's memsw to cgroup v2's memory and swap counters. To correctly maintain the per-memcg swapcache stat, one option which this patch has adopted is to charge the page before adding it to swapcache. One challenge in this option is the failure case of add_to_swap_cache() on which we need to undo the mem_cgroup_charge(). Specifically undoing mem_cgroup_uncharge_swap() is not simple. This patch circumvent this specific issue by removing the failure path of add_to_swap_cache() by providing __GFP_NOFAIL. Please note that in this specific situation ENOMEM was the only possible failure of add_to_swap_cache() which is removed by using __GFP_NOFAIL. Another option was to use __mod_memcg_lruvec_state(NR_SWAPCACHE) in mem_cgroup_charge() but then we need to take of the do_swap_page() case where synchronous swap devices bypass the swapcache. The do_swap_page() already does hackery to set and reset PageSwapCache bit to make mem_cgroup_charge() execute the swap accounting code and then we would need to add additional parameter to tell to not touch NR_SWAPCACHE stat as that code patch bypass swapcache. This patch added memcg charging API explicitly foe swapin pages and cleaned up do_swap_page() to not set and reset PageSwapCache bit. Signed-off-by: Shakeel Butt <redacted>
The patch makes sense to me. While it extends the charge interface, I actually quite like that it charges the page earlier - before putting it into wider circulation. It's a step in the right direction. But IMO the semantics of mem_cgroup_charge_swapin_page() are a bit too fickle: the __GFP_NOFAIL in add_to_swap_cache() works around it, but having a must-not-fail-after-this line makes the code tricky to work on and error prone. It would be nicer to do a proper transaction sequence.
quoted hunk ↗ jump to hunk
@@ -497,16 +497,15 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, __SetPageLocked(page); __SetPageSwapBacked(page); - /* May fail (-ENOMEM) if XArray node allocation failed. */ - if (add_to_swap_cache(page, entry, gfp_mask & GFP_RECLAIM_MASK, &shadow)) { - put_swap_page(page, entry); + if (mem_cgroup_charge_swapin_page(page, NULL, gfp_mask, entry)) goto fail_unlock; - } - if (mem_cgroup_charge(page, NULL, gfp_mask)) { - delete_from_swap_cache(page); - goto fail_unlock; - } + /* + * Use __GFP_NOFAIL to not worry about undoing the changes done by + * mem_cgroup_charge_swapin_page() on failure of add_to_swap_cache(). + */ + add_to_swap_cache(page, entry, + (gfp_mask|__GFP_NOFAIL) & GFP_RECLAIM_MASK, &shadow);
How about: mem_cgroup_charge_swapin_page() add_to_swap_cache() mem_cgroup_finish_swapin_page() where finish_swapin_page() only uncharges the swap entry (on cgroup1) once the swap->memory transition is complete? Otherwise the patch looks good to me.