Re: [PATCH mm-unstable v19 06/14] mm/khugepaged: generalize... | linux-trace-kernel

[PATCH mm-unstable v19 00/14] khugepaged: add mTHP collapse support · Nico Pache <npache@redhat.com> · 2026-06-05
[PATCH mm-unstable v19 01/14] mm/khugepaged: generalize hugepage_vma_revalidate for mTHP support · Nico Pache <npache@redhat.com> · 2026-06-05
[PATCH mm-unstable v19 02/14] mm/khugepaged: generalize alloc_charge_folio() · Nico Pache <npache@redhat.com> · 2026-06-05
[PATCH mm-unstable v19 03/14] mm/khugepaged: rework max_ptes_* handling with helper functions · Nico Pache <npache@redhat.com> · 2026-06-05
[PATCH mm-unstable v19 04/14] mm/khugepaged: generalize __collapse_huge_page_* for mTHP support · Nico Pache <npache@redhat.com> · 2026-06-05
Re: [PATCH mm-unstable v19 04/14] mm/khugepaged: generalize __collapse_huge_page_* for mTHP support · Zi Yan <ziy@nvidia.com> · 2026-06-05
[PATCH mm-unstable v19 05/14] mm/khugepaged: require collapse_huge_page to enter/exit with the lock dropped · Nico Pache <npache@redhat.com> · 2026-06-05
Re: [PATCH mm-unstable v19 05/14] mm/khugepaged: require collapse_huge_page to enter/exit with the lock dropped · Zi Yan <ziy@nvidia.com> · 2026-06-05
Re: [PATCH mm-unstable v19 05/14] mm/khugepaged: require collapse_huge_page to enter/exit with the lock dropped · Lance Yang <lance.yang@linux.dev> · 2026-06-08
[PATCH mm-unstable v19 06/14] mm/khugepaged: generalize collapse_huge_page for mTHP collapse · Nico Pache <npache@redhat.com> · 2026-06-05
Re: [PATCH mm-unstable v19 06/14] mm/khugepaged: generalize collapse_huge_page for mTHP collapse · "David Hildenbrand (Arm)" <david@kernel.org> · 2026-06-05
Re: [PATCH mm-unstable v19 06/14] mm/khugepaged: generalize collapse_huge_page for mTHP collapse · Lorenzo Stoakes <ljs@kernel.org> · 2026-06-05
Re: [PATCH mm-unstable v19 06/14] mm/khugepaged: generalize collapse_huge_page for mTHP collapse · Lorenzo Stoakes <ljs@kernel.org> · 2026-06-05
Re: [PATCH mm-unstable v19 06/14] mm/khugepaged: generalize collapse_huge_page for mTHP collapse · Lance Yang <lance.yang@linux.dev> · 2026-06-08
[PATCH mm-unstable v19 07/14] mm/khugepaged: skip collapsing mTHP to smaller orders · Nico Pache <npache@redhat.com> · 2026-06-05
Re: [PATCH mm-unstable v19 07/14] mm/khugepaged: skip collapsing mTHP to smaller orders · Lance Yang <lance.yang@linux.dev> · 2026-06-08
[PATCH mm-unstable v19 08/14] mm/khugepaged: add per-order mTHP collapse failure statistics · Nico Pache <npache@redhat.com> · 2026-06-05
Re: [PATCH mm-unstable v19 08/14] mm/khugepaged: add per-order mTHP collapse failure statistics · Lance Yang <lance.yang@linux.dev> · 2026-06-08
[PATCH mm-unstable v19 09/14] mm/khugepaged: improve tracepoints for mTHP orders · Nico Pache <npache@redhat.com> · 2026-06-05
Re: [PATCH mm-unstable v19 09/14] mm/khugepaged: improve tracepoints for mTHP orders · Lance Yang <lance.yang@linux.dev> · 2026-06-08
[PATCH mm-unstable v19 10/14] mm/khugepaged: introduce collapse_possible_orders helper functions · Nico Pache <npache@redhat.com> · 2026-06-05
Re: [PATCH mm-unstable v19 10/14] mm/khugepaged: introduce collapse_possible_orders helper functions · Lorenzo Stoakes <ljs@kernel.org> · 2026-06-05
Re: [PATCH mm-unstable v19 10/14] mm/khugepaged: introduce collapse_possible_orders helper functions · Lance Yang <lance.yang@linux.dev> · 2026-06-08
[PATCH mm-unstable v19 11/14] mm/khugepaged: Introduce mTHP collapse support · Nico Pache <npache@redhat.com> · 2026-06-05
Re: [PATCH mm-unstable v19 11/14] mm/khugepaged: Introduce mTHP collapse support · "David Hildenbrand (Arm)" <david@kernel.org> · 2026-06-05
Re: [PATCH mm-unstable v19 11/14] mm/khugepaged: Introduce mTHP collapse support · Lorenzo Stoakes <ljs@kernel.org> · 2026-06-05
Re: [PATCH mm-unstable v19 11/14] mm/khugepaged: Introduce mTHP collapse support · Nico Pache <npache@redhat.com> · 2026-06-09
Re: [PATCH mm-unstable v19 11/14] mm/khugepaged: Introduce mTHP collapse support · Lance Yang <lance.yang@linux.dev> · 2026-06-06
Re: [PATCH mm-unstable v19 11/14] mm/khugepaged: Introduce mTHP collapse support · "David Hildenbrand (Arm)" <david@kernel.org> · 2026-06-08
Re: [PATCH mm-unstable v19 11/14] mm/khugepaged: Introduce mTHP collapse support · Lance Yang <lance.yang@linux.dev> · 2026-06-08
Re: [PATCH mm-unstable v19 11/14] mm/khugepaged: Introduce mTHP collapse support · Lance Yang <lance.yang@linux.dev> · 2026-06-09
Re: [PATCH mm-unstable v19 11/14] mm/khugepaged: Introduce mTHP collapse support · Nico Pache <npache@redhat.com> · 2026-06-09
Re: [PATCH mm-unstable v19 11/14] mm/khugepaged: Introduce mTHP collapse support · "David Hildenbrand (Arm)" <david@kernel.org> · 2026-06-09
Re: [PATCH mm-unstable v19 11/14] mm/khugepaged: Introduce mTHP collapse support · Nico Pache <npache@redhat.com> · 2026-06-09
Re: [PATCH mm-unstable v19 11/14] mm/khugepaged: Introduce mTHP collapse support · Lance Yang <lance.yang@linux.dev> · 2026-06-09
Re: [PATCH mm-unstable v19 11/14] mm/khugepaged: Introduce mTHP collapse support · Nico Pache <npache@redhat.com> · 2026-06-09
Re: [PATCH mm-unstable v19 11/14] mm/khugepaged: Introduce mTHP collapse support · Lance Yang <lance.yang@linux.dev> · 2026-06-09
[PATCH mm-unstable v19 12/14] mm/khugepaged: avoid unnecessary mTHP collapse attempts · Nico Pache <npache@redhat.com> · 2026-06-05
Re: [PATCH mm-unstable v19 12/14] mm/khugepaged: avoid unnecessary mTHP collapse attempts · "David Hildenbrand (Arm)" <david@kernel.org> · 2026-06-05
Re: [PATCH mm-unstable v19 12/14] mm/khugepaged: avoid unnecessary mTHP collapse attempts · Lorenzo Stoakes <ljs@kernel.org> · 2026-06-05
Re: [PATCH mm-unstable v19 12/14] mm/khugepaged: avoid unnecessary mTHP collapse attempts · Lance Yang <lance.yang@linux.dev> · 2026-06-08
[PATCH mm-unstable v19 13/14] mm/khugepaged: run khugepaged for all orders · Nico Pache <npache@redhat.com> · 2026-06-05
[PATCH mm-unstable v19 14/14] Documentation: mm: update the admin guide for mTHP collapse · Nico Pache <npache@redhat.com> · 2026-06-05
Re: [PATCH mm-unstable v19 14/14] Documentation: mm: update the admin guide for mTHP collapse · "David Hildenbrand (Arm)" <david@kernel.org> · 2026-06-05
Re: [PATCH mm-unstable v19 14/14] Documentation: mm: update the admin guide for mTHP collapse · Lorenzo Stoakes <ljs@kernel.org> · 2026-06-05
Re: [PATCH mm-unstable v19 14/14] Documentation: mm: update the admin guide for mTHP collapse · Lance Yang <lance.yang@linux.dev> · 2026-06-08
Re: [PATCH mm-unstable v19 00/14] khugepaged: add mTHP collapse support · "David Hildenbrand (Arm)" <david@kernel.org> · 2026-06-05
Re: [PATCH mm-unstable v19 00/14] khugepaged: add mTHP collapse support · Lorenzo Stoakes <ljs@kernel.org> · 2026-06-05
Re: [PATCH mm-unstable v19 00/14] khugepaged: add mTHP collapse support · Andrew Morton <akpm@linux-foundation.org> · 2026-06-06

Re: [PATCH mm-unstable v19 06/14] mm/khugepaged: generalize collapse_huge_page for mTHP collapse

From: Lorenzo Stoakes <ljs@kernel.org>
Date: 2026-06-05 18:15:36
Also in: linux-doc, linux-mm, lkml

On Fri, Jun 05, 2026 at 07:48:17PM +0200, David Hildenbrand (Arm) wrote:

On 6/5/26 18:14, Nico Pache wrote:

quoted

Pass an order to collapse_huge_page to support collapsing anon memory to
arbitrary orders within a PMD. order indicates what mTHP size we are
attempting to collapse to.

For non-PMD collapse we must leave the anon VMA write locked until after
we collapse the mTHP-- in the PMD case all the pages are isolated, but in
the mTHP case this is not true, and we must keep the lock to prevent
access/changes to the page tables. This can happen if the rmap walkers hit
a pmd_none while the PMD entry is currently unavailable due to being
temporarily removed during the collapse phase.

To properly establish the page table hierarchy without violating any
expectations from certain architectures (e.g. MIPS), we must make sure to
have the PMD reinstalled before the PTEs, and hold both PTE/PMD locks
before calling update_mmu_cache_range() (if they are distinct locks).

Signed-off-by: Nico Pache <npache@redhat.com>
---

[...]

quoted

 	 */
 	__folio_mark_uptodate(folio);
-	pgtable = pmd_pgtable(_pmd);
-
 	spin_lock(pmd_ptl);
-	BUG_ON(!pmd_none(*pmd));
-	pgtable_trans_huge_deposit(mm, pmd, pgtable);
-	map_anon_folio_pmd_nopf(folio, pmd, vma, address);
+	VM_WARN_ON_ONCE(!pmd_none(*pmd));
+	if (is_pmd_order(order)) {
+		pgtable = pmd_pgtable(_pmd);
+		pgtable_trans_huge_deposit(mm, pmd, pgtable);
+		map_anon_folio_pmd_nopf(folio, pmd, vma, pmd_addr);
+	} else {
+		/*
+		 * Some architectures (e.g. MIPS) walk the live page table in
+		 * their implementation. update_mmu_cache_range() must be called
+		 * with a valid page table hierarchy and the PTE lock held.
+		 * Acquire it nested inside pmd_ptl when they are distinct locks.
+		 */
+		if (pte_ptl != pmd_ptl)
+			spin_lock_nested(pte_ptl, SINGLE_DEPTH_NESTING);
+		pmd_populate(mm, pmd, pmd_pgtable(_pmd));
+		map_anon_folio_pte_nopf(folio, pte, vma, start_addr,
+					  /*uffd_wp=*/ false);
+		if (pte_ptl != pmd_ptl)
+			spin_unlock(pte_ptl);
+	}
 	spin_unlock(pmd_ptl);

 	folio = NULL;

 	result = SCAN_SUCCEED;
 out_up_write:
+	if (anon_vma_locked)
+		anon_vma_unlock_write(vma->anon_vma);
+	if (pte)
+		pte_unmap(pte);

We re-enable some page table walkers before we unmap the PTE.

We still hold the mmap lock in write mode, so nothing would currently try
reclaiming the page table concurrently.

Reclaim uses rmap walkers though?

Oh you mean as in page table teardown, we're safe from higher level page table
teardown, but we're not safe from zap PTE page table teardown, as
CONFIG_PT_RECLAIM makes this possible on zap now.

That is RCU safe, so the unmap would keep us safe here, but now we could lose
the PTE page table.

But, only MADV_DONTNEED sets reclaim_pt = true, and that holds the VMA read lock
so we're safe.

And anyway:

MADV_DONTNEED - VMA read lock (we hold VMA write lock)
zap_vma_for_reaping() - mmap read lock
process teardown, munmap - mmap read lock
fault - vma/mmap read lock

So the vma/mmap locks save us from those.

So rmap-wise, only the i_mmap walkers remain (truncate, hole punch, et al. and
also hugetlbfs truncate/hole-punch which does its own nonsense too), but none of
those allow for reclaim_pt to happen in any case.

So yeah we're safe but we should what, reorder these 2 statements?

But yes I agree that can be a follow-up, nothing's broken AFAICT.

quoted hunk ↗ jump to hunk

So I guess this works right now, but we should likely rework that code later to
either revert both statements. Or maybe we can simply unmap like we did, and
simply remap before we call map_anon_folio_pte_nopf()? Remapping should not fail.

Alternatively to an unmap+remap, I think we could also unmap earlier for PMD

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 6de935e76ceb..ba2a2508dda6 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c

@@ -1378,6 +1378,8 @@ static enum scan_result collapse_huge_page(struct

mm_struct *mm, unsigned long s
        if (is_pmd_order(order)) {
                anon_vma_unlock_write(vma->anon_vma);
                anon_vma_locked = false;
+               pte_unmap(pte);
+               pte = NULL;
        }

        result = __collapse_huge_page_copy(pte, folio, pmd, _pmd,

But this can also be handled later.

Yup I mean it'd be nicer to do it in one place if we can (+ impact of holding
RCU lock longer not an issue), but all this code needs rewokr anyway.

We now hold an anon_vma lock a bit longer for !pmd-collapse. But there is also
less to copy. If that bites us, we can try optimizing later.

Yeah I do worry about holding these locks longer. But we'll see.


So after another skim, I think this patch is ready for primetime. We can address
the things mentioned above later ... and any fallout can be fixed later, if any.

Acked-by: David Hildenbrand (Arm) <david@kernel.org>

Yes, also from my side - after a git range-diff and looking into above, LGTM,
so:

Reviewed-by: Lorenzo Stoakes <ljs@kernel.org>

Now to go look at the core algo patch :)


--
Cheers,

David

Cheers, Lorenzo

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help