[PATCH v3 01/19] mm/hugetlb: Fix boot panic with CONFIG_DEBUG_VM and HVO bootmem pages
From: Muchun Song <hidden>
Date: 2026-06-02 10:11:00
Also in:
linux-mm, lkml
Subsystem:
hugetlb subsystem, memory management, memory management - core, the rest · Maintainers:
Muchun Song, Oscar Salvador, Andrew Morton, David Hildenbrand, Linus Torvalds
Commit 622026e87c40 ("mm/hugetlb: remove fake head pages") switched
HVO to reuse per-zone shared tail pages from zone->vmemmap_tails[].
Those shared tail pages were initialized in hugetlb_vmemmap_init(), but
bootmem HugeTLB folios are prepared earlier from gather_bootmem_prealloc().
With hugetlb_free_vmemmap=on, prep_and_add_bootmem_folios() can access
pageblock flags on bootmem HugeTLB pages whose mirrored tail struct pages
already point to the shared tail page. On CONFIG_DEBUG_VM kernels,
get_pfnblock_bitmap_bitidx() then dereferences the still-uninitialized
shared tail page and can panic during boot.
Initialize zone->vmemmap_tails[] from gather_bootmem_prealloc(), before
bootmem HugeTLB folios are processed, and drop the later initialization
from hugetlb_vmemmap_init().
This bug only affects CONFIG_DEBUG_VM kernels, where the relevant
assertion is evaluated.
Fixes: 622026e87c40 ("mm/hugetlb: remove fake head pages")
Signed-off-by: Muchun Song <redacted>
Acked-by: Oscar Salvador <osalvador@suse.de>
---
v2->v3:
- add a comment explaining why shared tail pages must be initialized from
gather_bootmem_prealloc() before hugetlb_vmemmap_init() runs (per Oscar
Salvador)
- update the stale sparse-vmemmap comment to point to gather_bootmem_prealloc()
as the bootmem HugeTLB shared-tail initialization site (reported by Oscar
Salvador)
---
mm/hugetlb.c | 25 +++++++++++++++++++++++++
mm/hugetlb_vmemmap.c | 17 -----------------
mm/sparse-vmemmap.c | 2 +-
3 files changed, 26 insertions(+), 18 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 571212b80835..cd55524c7e30 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c@@ -3365,6 +3365,31 @@ static void __init gather_bootmem_prealloc(void) .max_threads = num_node_state(N_MEMORY), .numa_aware = true, }; +#ifdef CONFIG_HUGETLB_PAGE_OPTIMIZE_VMEMMAP + struct zone *zone; + + for_each_zone(zone) { + for (int i = 0; i < NR_VMEMMAP_TAILS; i++) { + struct page *tail, *p; + unsigned int order; + + tail = zone->vmemmap_tails[i]; + if (!tail) + continue; + + order = i + VMEMMAP_TAIL_MIN_ORDER; + p = page_to_virt(tail); + /* + * prep_and_add_bootmem_folios() can access pageblock + * flags on bootmem HugeTLB pages, so initialize the + * shared tail struct pages here before bootmem folios + * start using them. + */ + for (int j = 0; j < PAGE_SIZE / sizeof(struct page); j++) + init_compound_tail(p + j, NULL, order, zone); + } + } +#endif padata_do_multithreaded(&job); }
diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index 133b46dfb09f..c713c0d2593a 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c@@ -870,27 +870,10 @@ static const struct ctl_table hugetlb_vmemmap_sysctls[] = { static int __init hugetlb_vmemmap_init(void) { const struct hstate *h; - struct zone *zone; /* HUGETLB_VMEMMAP_RESERVE_SIZE should cover all used struct pages */ BUILD_BUG_ON(__NR_USED_SUBPAGE > HUGETLB_VMEMMAP_RESERVE_PAGES); - for_each_zone(zone) { - for (int i = 0; i < NR_VMEMMAP_TAILS; i++) { - struct page *tail, *p; - unsigned int order; - - tail = zone->vmemmap_tails[i]; - if (!tail) - continue; - - order = i + VMEMMAP_TAIL_MIN_ORDER; - p = page_to_virt(tail); - for (int j = 0; j < PAGE_SIZE / sizeof(struct page); j++) - init_compound_tail(p + j, NULL, order, zone); - } - } - for_each_hstate(h) { if (hugetlb_vmemmap_optimizable(h)) { register_sysctl_init("vm", hugetlb_vmemmap_sysctls);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index 112ccf9c71ca..8f41b73fb674 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c@@ -342,7 +342,7 @@ static __meminit struct page *vmemmap_get_tail(unsigned int order, struct zone * * * Any initialization done here will be overwritten by memmap_init(). * - * hugetlb_vmemmap_init() will take care of initialization after + * gather_bootmem_prealloc() will take care of initialization after * memmap_init(). */
--
2.54.0