[PATCH v13 mm-new 10/16] khugepaged: add per-order mTHP collapse failure statistics
From: Nico Pache <npache@redhat.com>
Date: 2025-12-01 17:49:15
Also in:
linux-doc, linux-mm, lkml
Subsystem:
documentation, memory management, memory management - misc, memory management - thp (transparent huge page), the rest · Maintainers:
Jonathan Corbet, Andrew Morton, David Hildenbrand, Lorenzo Stoakes, Linus Torvalds
Add three new mTHP statistics to track collapse failures for different orders when encountering swap PTEs, excessive none PTEs, and shared PTEs: - collapse_exceed_swap_pte: Increment when mTHP collapse fails due to swap PTEs - collapse_exceed_none_pte: Counts when mTHP collapse fails due to exceeding the none PTE threshold for the given order - collapse_exceed_shared_pte: Counts when mTHP collapse fails due to shared PTEs These statistics complement the existing THP_SCAN_EXCEED_* events by providing per-order granularity for mTHP collapse attempts. The stats are exposed via sysfs under `/sys/kernel/mm/transparent_hugepage/hugepages-*/stats/` for each supported hugepage size. As we currently dont support collapsing mTHPs that contain a swap or shared entry, those statistics keep track of how often we are encountering failed mTHP collapses due to these restrictions. Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by: Nico Pache <npache@redhat.com> --- Documentation/admin-guide/mm/transhuge.rst | 24 ++++++++++++++++++++++ include/linux/huge_mm.h | 3 +++ mm/huge_memory.c | 7 +++++++ mm/khugepaged.c | 16 ++++++++++++--- 4 files changed, 47 insertions(+), 3 deletions(-)
diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index c51932e6275d..d396d1bfb274 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst@@ -714,6 +714,30 @@ nr_anon_partially_mapped an anonymous THP as "partially mapped" and count it here, even though it is not actually partially mapped anymore. +collapse_exceed_none_pte + The number of collapse attempts that failed due to exceeding the + max_ptes_none threshold. For mTHP collapse, Currently only max_ptes_none + values of 0 and (HPAGE_PMD_NR - 1) are supported. Any other value will + emit a warning and no mTHP collapse will be attempted. khugepaged will + try to collapse to the largest enabled (m)THP size, if it fails, it will + try the next lower enabled mTHP size. This counter records the number of + times a collapse attempt was skipped for exceeding the max_ptes_none + threshold, and khugepaged will move on to the next available mTHP size. + +collapse_exceed_swap_pte + The number of anonymous mTHP pte ranges which were unable to collapse due + to containing at least one swap PTE. Currently khugepaged does not + support collapsing mTHP regions that contain a swap PTE. This counter can + be used to monitor the number of khugepaged mTHP collapses that failed + due to the presence of a swap PTE. + +collapse_exceed_shared_pte + The number of anonymous mTHP pte ranges which were unable to collapse due + to containing at least one shared PTE. Currently khugepaged does not + support collapsing mTHP pte ranges that contain a shared PTE. This + counter can be used to monitor the number of khugepaged mTHP collapses + that failed due to the presence of a shared PTE. + As the system ages, allocating huge pages may be expensive as the system uses memory compaction to copy data around memory to free a huge page for use. There are some counters in ``/proc/vmstat`` to help
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index f93365e182b4..1082b78e794d 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h@@ -144,6 +144,9 @@ enum mthp_stat_item { MTHP_STAT_SPLIT_DEFERRED, MTHP_STAT_NR_ANON, MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, + MTHP_STAT_COLLAPSE_EXCEED_SWAP, + MTHP_STAT_COLLAPSE_EXCEED_NONE, + MTHP_STAT_COLLAPSE_EXCEED_SHARED, __MTHP_STAT_COUNT };
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index c1e1e91b0e61..b4d9b3ac9a7c 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c@@ -639,6 +639,10 @@ DEFINE_MTHP_STAT_ATTR(split_failed, MTHP_STAT_SPLIT_FAILED); DEFINE_MTHP_STAT_ATTR(split_deferred, MTHP_STAT_SPLIT_DEFERRED); DEFINE_MTHP_STAT_ATTR(nr_anon, MTHP_STAT_NR_ANON); DEFINE_MTHP_STAT_ATTR(nr_anon_partially_mapped, MTHP_STAT_NR_ANON_PARTIALLY_MAPPED); +DEFINE_MTHP_STAT_ATTR(collapse_exceed_swap_pte, MTHP_STAT_COLLAPSE_EXCEED_SWAP); +DEFINE_MTHP_STAT_ATTR(collapse_exceed_none_pte, MTHP_STAT_COLLAPSE_EXCEED_NONE); +DEFINE_MTHP_STAT_ATTR(collapse_exceed_shared_pte, MTHP_STAT_COLLAPSE_EXCEED_SHARED); + static struct attribute *anon_stats_attrs[] = { &anon_fault_alloc_attr.attr,
@@ -655,6 +659,9 @@ static struct attribute *anon_stats_attrs[] = { &split_deferred_attr.attr, &nr_anon_attr.attr, &nr_anon_partially_mapped_attr.attr, + &collapse_exceed_swap_pte_attr.attr, + &collapse_exceed_none_pte_attr.attr, + &collapse_exceed_shared_pte_attr.attr, NULL, };
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b2ea56c9bb42..efb8a47af65a 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c@@ -604,7 +604,9 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, continue; } else { result = SCAN_EXCEED_NONE_PTE; - count_vm_event(THP_SCAN_EXCEED_NONE_PTE); + if (!is_mthp_order(order)) + count_vm_event(THP_SCAN_EXCEED_NONE_PTE); + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_NONE); goto out; } }
@@ -634,10 +636,17 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, * shared may cause a future higher order collapse on a * rescan of the same range. */ - if (is_mthp_order(order) || (cc->is_khugepaged && - shared > khugepaged_max_ptes_shared)) { + if (is_mthp_order(order)) { + result = SCAN_EXCEED_SHARED_PTE; + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_SHARED); + goto out; + } + + if (cc->is_khugepaged && + shared > khugepaged_max_ptes_shared) { result = SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_SHARED); goto out; } }
@@ -1086,6 +1095,7 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm, * range. */ if (is_mthp_order(order)) { + count_mthp_stat(order, MTHP_STAT_COLLAPSE_EXCEED_SWAP); pte_unmap(pte); mmap_read_unlock(mm); result = SCAN_EXCEED_SWAP_PTE;
--
2.51.1