Thread (15 messages) 15 messages, 4 authors, 2025-09-19

Re: [PATCH v5 2/6] mm: remap unused subpages to shared zeropage when splitting isolated thp

From: Lance Yang <lance.yang@linux.dev>
Date: 2025-09-19 13:24:57
Also in: linux-doc, linux-mediatek, linux-mm, lkml


On 2025/9/19 21:09, David Hildenbrand wrote:
On 19.09.25 14:19, Lance Yang wrote:
quoted
Hey David,

I believe I've found the exact reason why KSM skips MTE-tagged pages ;p
quoted

On 2025/9/19 16:14, Lance Yang wrote:
quoted

On 2025/9/19 15:55, David Hildenbrand wrote:
quoted
quoted
quoted
I think where possible we really only want to identify problematic
(tagged) pages and skip them. And we should either look into fixing
KSM
as well or finding out why KSM is not affected.
Yeah. Seems like we could introduce a new helper,
folio_test_mte_tagged(struct
folio *folio). By default, it would return false, and architectures
like
arm64
can override it.
If we add a new helper it should instead express the semantics that
we cannot deduplicate.
Agreed.
quoted
For THP, I recall that only some pages might be tagged. So likely we
want to check per page.
Yes, a per-page check would be simpler.
quoted
quoted
Looking at the code, the PG_mte_tagged flag is not set for regular 
THP.
I think it's supported for THP per page. Only for hugetlb we tag the
whole thing through the head page instead of individual pages.
Right. That's exactly what I meant.
quoted
quoted
The MTE
status actually comes from the VM_MTE flag in the VMA that maps it.
During the rmap walk we could check the VMA flag, but there would be
no way to just stop the THP shrinker scanning this page early.
quoted
static inline bool folio_test_hugetlb_mte_tagged(struct folio *folio)
{
     bool ret = test_bit(PG_mte_tagged, &folio->flags.f);

     VM_WARN_ON_ONCE(!folio_test_hugetlb(folio));

     /*
      * If the folio is tagged, ensure ordering with a likely 
subsequent
      * read of the tags.
      */
     if (ret)
         smp_rmb();
     return ret;
}

static inline bool page_mte_tagged(struct page *page)
{
     bool ret = test_bit(PG_mte_tagged, &page->flags.f);

     VM_WARN_ON_ONCE(folio_test_hugetlb(page_folio(page)));

     /*
      * If the page is tagged, ensure ordering with a likely 
subsequent
      * read of the tags.
      */
     if (ret)
         smp_rmb();
     return ret;
}

contpte_set_ptes()
     __set_ptes()
         __set_ptes_anysz()
             __sync_cache_and_tags()
                 mte_sync_tags()
                     set_page_mte_tagged()

Then, having the THP shrinker skip any folios that are identified as
MTE-tagged.
Likely we should just do something like (maybe we want better naming)

#ifndef page_is_mergable
#define page_is_mergable(page) (true)
#endif

Maybe something like page_is_optimizable()? Just a thought ;p
quoted
And for arm64 have it be

#define page_is_mergable(page) (!page_mte_tagged(page))


And then do
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 1f0813b956436..1cac9093918d6 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -4251,7 +4251,8 @@ static bool thp_underused(struct folio *folio)
          for (i = 0; i < folio_nr_pages(folio); i++) {
                  kaddr = kmap_local_folio(folio, i * PAGE_SIZE);
-               if (!memchr_inv(kaddr, 0, PAGE_SIZE)) {
+               if (page_is_mergable(folio_page(folio, i)) &&
+                   !memchr_inv(kaddr, 0, PAGE_SIZE)) {
                          num_zero_pages++;
                          if (num_zero_pages >
khugepaged_max_ptes_none) {
                                  kunmap_local(kaddr);
diff --git a/mm/migrate.c b/mm/migrate.c
index 946253c398072..476a9a9091bd3 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -306,6 +306,8 @@ static bool try_to_map_unused_to_zeropage(struct
page_vma_mapped_walk *pvmw,

          if (PageCompound(page))
                  return false;
+       if (!page_is_mergable(page))
+               return false;
          VM_BUG_ON_PAGE(!PageAnon(page), page);
          VM_BUG_ON_PAGE(!PageLocked(page), page);
          VM_BUG_ON_PAGE(pte_present(ptep_get(pvmw->pte)), page);
Looks good to me!
quoted

For KSM, similarly just bail out early. But still wondering if this
is already checked
somehow for KSM.
+1 I'm looking for a machine to test it on.
Interestingly, it seems KSM is already skipping MTE-tagged pages. My 
test,
running on a v6.8.0 kernel inside QEMU (with MTE enabled), shows no 
merging
activity for those pages ...
KSM's call to pages_identical() ultimately leads to memcmp_pages(). The
arm64 implementation of memcmp_pages() in arch/arm64/kernel/mte.c 
contains
a specific check that prevents merging in this case.

try_to_merge_one_page()
    -> pages_identical()
        -> !memcmp_pages() Fails!
        -> replace_page()


int memcmp_pages(struct page *page1, struct page *page2)
{
    char *addr1, *addr2;
    int ret;

    addr1 = page_address(page1);
    addr2 = page_address(page2);
    ret = memcmp(addr1, addr2, PAGE_SIZE);

    if (!system_supports_mte() || ret)
        return ret;

    /*
     * If the page content is identical but at least one of the pages is
     * tagged, return non-zero to avoid KSM merging. If only one of the
     * pages is tagged, __set_ptes() may zero or change the tags of the
     * other page via mte_sync_tags().
     */
    if (page_mte_tagged(page1) || page_mte_tagged(page2))
        return addr1 != addr2;

    return ret;
}

IIUC, if either page is MTE-tagged, memcmp_pages() intentionally returns
a non-zero value, which in turn causes pages_identical() to return false.
Cool, so we should likely just use that then in the shrinker code. Can 
you send a fix?
Certainly! I'll get on that ;p

Cheers,
Lance

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help