Thread (22 messages) 22 messages, 5 authors, 2021-08-14

Re: [PATCH 2/3] mm: free zapped tail pages when splitting isolated thp

From: Yang Shi <hidden>
Date: 2021-08-11 22:25:36
Also in: lkml

On Sun, Aug 8, 2021 at 10:49 AM Yu Zhao [off-list ref] wrote:
On Wed, Aug 4, 2021 at 6:13 PM Yang Shi [off-list ref] wrote:
quoted
On Fri, Jul 30, 2021 at 11:39 PM Yu Zhao [off-list ref] wrote:
quoted
If a tail page has only two references left, one inherited from the
isolation of its head and the other from lru_add_page_tail() which we
are about to drop, it means this tail page was concurrently zapped.
Then we can safely free it and save page reclaim or migration the
trouble of trying it.

Signed-off-by: Yu Zhao <redacted>
Tested-by: Shuang Zhai <redacted>
---
 include/linux/vm_event_item.h |  1 +
 mm/huge_memory.c              | 28 ++++++++++++++++++++++++++++
 mm/vmstat.c                   |  1 +
 3 files changed, 30 insertions(+)
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index ae0dd1948c2b..829eeac84094 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -99,6 +99,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
                THP_SPLIT_PUD,
 #endif
+               THP_SPLIT_FREE,
                THP_ZERO_PAGE_ALLOC,
                THP_ZERO_PAGE_ALLOC_FAILED,
                THP_SWPOUT,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index d8b655856e79..5120478bca41 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2432,6 +2432,8 @@ static void __split_huge_page(struct page *page, struct list_head *list,
        struct address_space *swap_cache = NULL;
        unsigned long offset = 0;
        unsigned int nr = thp_nr_pages(head);
+       LIST_HEAD(pages_to_free);
+       int nr_pages_to_free = 0;
        int i;

        VM_BUG_ON_PAGE(list && PageLRU(head), head);
@@ -2506,6 +2508,25 @@ static void __split_huge_page(struct page *page, struct list_head *list,
                        continue;
                unlock_page(subpage);

+               /*
+                * If a tail page has only two references left, one inherited
+                * from the isolation of its head and the other from
+                * lru_add_page_tail() which we are about to drop, it means this
+                * tail page was concurrently zapped. Then we can safely free it
+                * and save page reclaim or migration the trouble of trying it.
+                */
+               if (list && page_ref_freeze(subpage, 2)) {
+                       VM_BUG_ON_PAGE(PageLRU(subpage), subpage);
+                       VM_BUG_ON_PAGE(PageCompound(subpage), subpage);
+                       VM_BUG_ON_PAGE(page_mapped(subpage), subpage);
+
+                       ClearPageActive(subpage);
+                       ClearPageUnevictable(subpage);
+                       list_move(&subpage->lru, &pages_to_free);
+                       nr_pages_to_free++;
+                       continue;
+               }
Yes, such page could be freed instead of swapping out. But I'm
wondering if we could have some simpler implementation. Since such
pages will be re-added to page list, so we should be able to check
their refcount in shrink_page_list(). If the refcount is 1, the
refcount inc'ed by lru_add_page_tail() has been put by later
put_page(), we know it is freed under us since the only refcount comes
from isolation, we could just jump to "keep" (the label in
shrink_page_list()), then such page will be freed later by
shrink_inactive_list().

For MADV_PAGEOUT, I think we could add some logic to handle such page
after shrink_page_list(), just like what shrink_inactive_list() does.

Migration already handles refcount == 1 page, so should not need any change.

Is this idea feasible?
Yes, but then we would have to loop over the tail pages twice, here
and in shrink_page_list(), right?
I don't quite get what you mean "loop over the tail pages twice". Once
THP is isolated then get split, all the tail pages will be put on the
list (local list for isolated pages), then the reclaimer would deal
with the head page, then continue to iterate the list to deal with
tail pages. Your patch could free the tail pages earlier. But it
should not make too much difference to free the tail pages a little
bit later IMHO.
In addition, if we try to freeze the refcount of a page in
shrink_page_list(), we couldn't be certain whether this page used to
be a tail page. So we would have to test every page. If a page wasn't
a tail page, it's unlikely for its refcount to drop unless there is a
race. But this patch isn't really intended to optimize such a race.
It's mainly for the next, i.e., we know there is a good chance to drop
tail pages (~10% on our systems). Sounds reasonable? Thanks.
I'm not sure what is the main source of the partial mapped THPs from
your fleets. But if most of them are generated by MADV_DONTNEED (this
is used by some userspace memory allocator libs), they should be on
deferred split list too. Currently deferred split shrinker just
shrinks those THPs (simply split them and free unmapped sub pages)
proportionally, we definitely could shrink them more aggressively, for
example, by setting shrinker->seeks to 0. I'm wondering if this will
achieve a similar effect or not.

I really don't have any objection to free such pages, but just
wondering if we could have something simpler or not.
quoted
quoted
+
                /*
                 * Subpages may be freed if there wasn't any mapping
                 * like if add_to_swap() is running on a lru page that
@@ -2515,6 +2536,13 @@ static void __split_huge_page(struct page *page, struct list_head *list,
                 */
                put_page(subpage);
        }
+
+       if (!nr_pages_to_free)
+               return;
+
+       mem_cgroup_uncharge_list(&pages_to_free);
+       free_unref_page_list(&pages_to_free);
+       count_vm_events(THP_SPLIT_FREE, nr_pages_to_free);
 }

 int total_mapcount(struct page *page)
diff --git a/mm/vmstat.c b/mm/vmstat.c
index b0534e068166..f486e5d98d96 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -1300,6 +1300,7 @@ const char * const vmstat_text[] = {
 #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
        "thp_split_pud",
 #endif
+       "thp_split_free",
        "thp_zero_page_alloc",
        "thp_zero_page_alloc_failed",
        "thp_swpout",
--
2.32.0.554.ge1b32706d8-goog
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help