Re: [PATCH v3] memcg: charge before adding to swapcache on swapin
From: Hugh Dickins <hidden>
Date: 2021-03-05 08:07:07
Also in:
linux-mm, lkml
On Wed, 3 Mar 2021, Shakeel Butt wrote:
Currently the kernel adds the page, allocated for swapin, to the swapcache before charging the page. This is fine but now we want a per-memcg swapcache stat which is essential for folks who wants to transparently migrate from cgroup v1's memsw to cgroup v2's memory and swap counters. In addition charging a page before exposing it to other parts of the kernel is a step in the right direction. To correctly maintain the per-memcg swapcache stat, this patch has adopted to charge the page before adding it to swapcache. One challenge in this option is the failure case of add_to_swap_cache() on which we need to undo the mem_cgroup_charge(). Specifically undoing mem_cgroup_uncharge_swap() is not simple. To resolve the issue, this patch introduces transaction like interface to charge a page for swapin. The function mem_cgroup_charge_swapin_page() initiates the charging of the page and mem_cgroup_finish_swapin_page() completes the charging process. So, the kernel starts the charging process of the page for swapin with mem_cgroup_charge_swapin_page(), adds the page to the swapcache and on success completes the charging process with mem_cgroup_finish_swapin_page(). Signed-off-by: Shakeel Butt <redacted>
Quite apart from helping with the stat you want, what you've ended up with here is a nice cleanup in several different ways (and I'm glad Johannes talked you out of __GFP_NOFAIL: much better like this). I'll say Acked-by: Hugh Dickins <redacted> but I am quite unhappy with the name mem_cgroup_finish_swapin_page(): it doesn't finish the swapin, it doesn't finish the page, and I'm not persuaded by your paragraph above that there's any "transaction" here (if there were, I'd suggest "commit" instead of "finish"'; and I'd get worried by the css_put before it's called - but no, that's fine, it's independent). How about complementing mem_cgroup_charge_swapin_page() with mem_cgroup_uncharge_swapin_swap()? I think that describes well what it does, at least in the do_memsw_account() case, and I hope we can overlook that it does nothing at all in the other case. And it really doesn't need a page argument: both places it's called have just allocated an order-0 page, there's no chance of a THP here; but you might have some idea of future expansion, or matching put_swap_page() - I won't object if you prefer to pass in the page. But more interesting, though off-topic, comments on it below...
quoted hunk ↗ jump to hunk
+/* + * mem_cgroup_finish_swapin_page - complete the swapin page charge transaction + * @page: page charged for swapin + * @entry: swap entry for which the page is charged + * + * This function completes the transaction of charging the page allocated for + * swapin. + */ +void mem_cgroup_finish_swapin_page(struct page *page, swp_entry_t entry) +{ /* * Cgroup1's unified memory+swap counter has been charged with the * new swapcache page, finish the transfer by uncharging the swap@@ -6760,20 +6796,14 @@ int mem_cgroup_charge(struct page *page, struct mm_struct *mm, gfp_t gfp_mask) * correspond 1:1 to page and swap slot lifetimes: we charge the * page to memory here, and uncharge swap when the slot is freed. */ - if (do_memsw_account() && PageSwapCache(page)) { - swp_entry_t entry = { .val = page_private(page) }; + if (!mem_cgroup_disabled() && do_memsw_account()) {
I understand why you put that !mem_cgroup_disabled() check in there,
but I have a series of observations on that.
First I was going to say that it would be better left to
mem_cgroup_uncharge_swap() itself.
Then I was going to say that I think it's already covered here
by the cgroup_memory_noswap check inside do_memsw_account().
Then, going back to mem_cgroup_uncharge_swap(), I realized that 5.8's
2d1c498072de ("mm: memcontrol: make swap tracking an integral part of
memory control") removed the do_swap_account or cgroup_memory_noswap
checks from mem_cgroup_uncharge_swap() and swap_cgroup_swapon() and
swap_cgroup_swapoff() - so since then we have been allocating totally
unnecessary swap_cgroup arrays when mem_cgroup_disabled() (and
mem_cgroup_uncharge_swap() has worked by reading the zalloced array).
I think, or am I confused? If I'm right on that, one of us ought to
send another patch putting back, either cgroup_memory_noswap checks
or mem_cgroup_disabled() checks in those three places - I suspect the
static key mem_cgroup_disabled() is preferable, but I'm getting dozy.
Whatever we do with that - and it's really not any business for this
patch - I think you can drop the mem_cgroup_disabled() check from
mem_cgroup_uncharge_swapin_swap().
/* * The swap entry might not get freed for a long time, * let's not wait for it. The page already received a * memory+swap charge, drop the swap entry duplicate. */ - mem_cgroup_uncharge_swap(entry, nr_pages); + mem_cgroup_uncharge_swap(entry, thp_nr_pages(page)); } - -out_put: - css_put(&memcg->css); -out: - return ret; }