Re: [PATCH v5 2/6] mm: remap unused subpages to shared zeropage when splitting isolated thp
From: Lance Yang <lance.yang@linux.dev>
Date: 2025-09-19 13:24:57
Also in:
linux-doc, linux-mediatek, linux-mm, lkml
On 2025/9/19 21:09, David Hildenbrand wrote:
On 19.09.25 14:19, Lance Yang wrote:quoted
Hey David, I believe I've found the exact reason why KSM skips MTE-tagged pages ;pquoted
On 2025/9/19 16:14, Lance Yang wrote:quoted
On 2025/9/19 15:55, David Hildenbrand wrote:quoted
quoted
quoted
I think where possible we really only want to identify problematic (tagged) pages and skip them. And we should either look into fixing KSM as well or finding out why KSM is not affected.Yeah. Seems like we could introduce a new helper, folio_test_mte_tagged(struct folio *folio). By default, it would return false, and architectures like arm64 can override it.If we add a new helper it should instead express the semantics that we cannot deduplicate.Agreed.quoted
For THP, I recall that only some pages might be tagged. So likely we want to check per page.Yes, a per-page check would be simpler.quoted
quoted
Looking at the code, the PG_mte_tagged flag is not set for regular THP.I think it's supported for THP per page. Only for hugetlb we tag the whole thing through the head page instead of individual pages.Right. That's exactly what I meant.quoted
quoted
The MTE status actually comes from the VM_MTE flag in the VMA that maps it.During the rmap walk we could check the VMA flag, but there would be no way to just stop the THP shrinker scanning this page early.quoted
static inline bool folio_test_hugetlb_mte_tagged(struct folio *folio) { bool ret = test_bit(PG_mte_tagged, &folio->flags.f); VM_WARN_ON_ONCE(!folio_test_hugetlb(folio)); /* * If the folio is tagged, ensure ordering with a likely subsequent * read of the tags. */ if (ret) smp_rmb(); return ret; } static inline bool page_mte_tagged(struct page *page) { bool ret = test_bit(PG_mte_tagged, &page->flags.f); VM_WARN_ON_ONCE(folio_test_hugetlb(page_folio(page))); /* * If the page is tagged, ensure ordering with a likely subsequent * read of the tags. */ if (ret) smp_rmb(); return ret; } contpte_set_ptes() __set_ptes() __set_ptes_anysz() __sync_cache_and_tags() mte_sync_tags() set_page_mte_tagged() Then, having the THP shrinker skip any folios that are identified as MTE-tagged.Likely we should just do something like (maybe we want better naming) #ifndef page_is_mergable #define page_is_mergable(page) (true) #endifMaybe something like page_is_optimizable()? Just a thought ;pquoted
And for arm64 have it be #define page_is_mergable(page) (!page_mte_tagged(page)) And then dodiff --git a/mm/huge_memory.c b/mm/huge_memory.c index 1f0813b956436..1cac9093918d6 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c@@ -4251,7 +4251,8 @@ static bool thp_underused(struct folio *folio)for (i = 0; i < folio_nr_pages(folio); i++) { kaddr = kmap_local_folio(folio, i * PAGE_SIZE); - if (!memchr_inv(kaddr, 0, PAGE_SIZE)) { + if (page_is_mergable(folio_page(folio, i)) && + !memchr_inv(kaddr, 0, PAGE_SIZE)) { num_zero_pages++; if (num_zero_pages > khugepaged_max_ptes_none) { kunmap_local(kaddr);diff --git a/mm/migrate.c b/mm/migrate.c index 946253c398072..476a9a9091bd3 100644 --- a/mm/migrate.c +++ b/mm/migrate.c@@ -306,6 +306,8 @@ static bool try_to_map_unused_to_zeropage(structpage_vma_mapped_walk *pvmw, if (PageCompound(page)) return false; + if (!page_is_mergable(page)) + return false; VM_BUG_ON_PAGE(!PageAnon(page), page); VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON_PAGE(pte_present(ptep_get(pvmw->pte)), page);Looks good to me!quoted
For KSM, similarly just bail out early. But still wondering if this is already checked somehow for KSM.+1 I'm looking for a machine to test it on.Interestingly, it seems KSM is already skipping MTE-tagged pages. My test, running on a v6.8.0 kernel inside QEMU (with MTE enabled), shows no merging activity for those pages ...KSM's call to pages_identical() ultimately leads to memcmp_pages(). The arm64 implementation of memcmp_pages() in arch/arm64/kernel/mte.c contains a specific check that prevents merging in this case. try_to_merge_one_page() -> pages_identical() -> !memcmp_pages() Fails! -> replace_page() int memcmp_pages(struct page *page1, struct page *page2) { char *addr1, *addr2; int ret; addr1 = page_address(page1); addr2 = page_address(page2); ret = memcmp(addr1, addr2, PAGE_SIZE); if (!system_supports_mte() || ret) return ret; /* * If the page content is identical but at least one of the pages is * tagged, return non-zero to avoid KSM merging. If only one of the * pages is tagged, __set_ptes() may zero or change the tags of the * other page via mte_sync_tags(). */ if (page_mte_tagged(page1) || page_mte_tagged(page2)) return addr1 != addr2; return ret; } IIUC, if either page is MTE-tagged, memcmp_pages() intentionally returns a non-zero value, which in turn causes pages_identical() to return false.Cool, so we should likely just use that then in the shrinker code. Can you send a fix?
Certainly! I'll get on that ;p Cheers, Lance