Thread (13 messages) 13 messages, 4 authors, 2021-03-26

Re: [PATCH] mm: page_alloc: fix memcg accounting leak in speculative cache lookup

From: Matthew Wilcox <hidden>
Date: 2021-03-26 02:53:44
Also in: linux-mm, lkml

On Thu, Mar 25, 2021 at 06:55:42PM -0700, Hugh Dickins wrote:
The first reason occurred to me this morning.  I thought I had been
clever to spot the PageHead race which you fix here.  But now I just feel
very stupid not to have spotted the very similar memcg_data race.  The
speculative racer may call mem_cgroup_uncharge() from __put_single_page(),
and the new call to split_page_memcg() do nothing because page_memcg(head)
is already NULL.

And is it even safe there, to sprinkle memcg_data through all of those
order-0 subpages, when free_the_page() is about to be applied to a
series of descending orders?  I could easily be wrong, but I think
free_pages_prepare()'s check_free_page() will find that is not
page_expected_state().
So back to something more like my original patch then?
+++ b/mm/page_alloc.c
@@ -5081,9 +5081,15 @@ void __free_pages(struct page *page, unsigned int order)
 {
        if (put_page_testzero(page))
                free_the_page(page, order);
-	else if (!PageHead(page))
-               while (order-- > 0)
-                       free_the_page(page + (1 << order), order);
+       else if (!PageHead(page)) {
+               while (order-- > 0) {
+                       struct page *tail = page + (1 << order);
+#ifdef CONFIG_MEMCG
+                       tail->memcg_data = page->memcg_data;
+#endif
+                       free_the_page(tail, order);
+               }
+       }
 }
 EXPORT_SYMBOL(__free_pages);
We can cache page->memcg_data before calling put_page_testzero(),
just like we cache the Head flag in Johannes' patch.
But, after all that, I'm now thinking that Matthew's original
e320d3012d25 ("mm/page_alloc.c: fix freeing non-compound pages")
is safer reverted.  The put_page_testzero() in __free_pages() was
not introduced for speculative pagecache: it was there in 2.4.0,
and atomic_dec_and_test() in 2.2, I don't have older trees to hand.
I think you're confused in that last assertion.  According to
linux-fullhistory, the first introduction of __free_pages was 2.3.29pre3
(September 1999), where it did indeed use put_page_testzero:

+extern inline void __free_pages(struct page *page, unsigned long order)
+{
+       if (!put_page_testzero(page))
+               return;
+       __free_pages_ok(page, order);
+}

Before that, we had only free_pages() and __free_page().
So, it has "always" been accepted that multiple references to a
high-order non-compound page can be given out and released: maybe
they were all released with __free_pages() of the right order, or
maybe only the last had to get that right; but as __free_pages()
stands today, all but the last caller frees all but the first
subpage.  A very rare leak seems much safer.

I don't have the answer (find somewhere in struct page to squirrel
away the order, even when it's a non-compound page?), and I think
each of us would much rather be thinking about other things at the
moment.  But for now it looks to me like NAK to this patch, and
revert of e320d3012d25.
We did discuss that possibility prior to the introduction of
e320d3012d25.  Here's one such:
https://lore.kernel.org/linux-mm/20200922031215.GZ32101-FZi0V3Vbi30CUdFEqe4BF2D2FQJk+8+b@public.gmane.org/T/#m0b08c0c3430e09e20fa6648877dc42b04b18e6f3
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help