Thread (38 messages) 38 messages, 9 authors, 2026-01-23

Re: [PATCH v6 1/5] mm/zone_device: Reinitialize large zone device private folios

From: Balbir Singh <hidden>
Date: 2026-01-19 22:15:19
Also in: amd-gfx, dri-devel, intel-xe, kvm, linux-cxl, linux-mm, lkml, nouveau

On 1/20/26 07:35, Jason Gunthorpe wrote:
On Mon, Jan 19, 2026 at 03:09:00PM -0500, Zi Yan wrote:
quoted
quoted
diff --git a/mm/internal.h b/mm/internal.h
index e430da900430a1..a7d3f5e4b85e49 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -806,14 +806,21 @@ static inline void prep_compound_head(struct page *page, unsigned int order)
 		atomic_set(&folio->_pincount, 0);
 		atomic_set(&folio->_entire_mapcount, -1);
 	}
-	if (order > 1)
+	if (order > 1) {
 		INIT_LIST_HEAD(&folio->_deferred_list);
+	} else {
+		folio->mapping = NULL;
+#ifdef CONFIG_MEMCG
+		folio->memcg_data = 0;
+#endif
+	}
prep_compound_head() is only called on >0 order pages. The above
code means when order == 1, folio->mapping and folio->memcg_data are
assigned NULL.
OK, fair enough, the conditionals would have to change and maybe it
shouldn't be called "compound_head" if it also cleans up normal pages.
quoted
quoted
 static inline void prep_compound_tail(struct page *head, int tail_idx)
 {
 	struct page *p = head + tail_idx;

+	p->flags.f &= ~0xffUL;	/* Clear possible order, page head */
No one cares about tail page flags if it is not checked in check_new_page()
from mm/page_alloc.c.
At least page_fixed_fake_head() does check PG_head in some
configurations. It does seem safer to clear it. Possibly order is
never used, but it is free to clear it.
quoted
quoted
-	if (order)
-		prep_compound_page(page, order);
+	prep_compound_page(page, order);
prep_compound_page() should only be called for >0 order pages. This creates
another weirdness in device pages by assuming all pages are
compound.
OK
quoted
quoted
+	folio = page_folio(page);
+	folio->pgmap = pgmap;
+	folio_lock(folio);
+	folio_set_count(folio, 1);
/* clear possible previous page->mapping */
folio->mapping = NULL;

/* clear possible previous page->_nr_pages */
#ifdef CONFIG_MEMCG
	folio->memcg_data = 0;
#endif
This is reasonable too, but prep_compound_head() was doing more than
that, it is also clearing the order, and this needs to clear the head
bit.  That's why it was apppealing to reuse those functions, but you
are right they are not ideal.

I suppose we want some prep_single_page(page) and some reorg to share
code with the other prep function.
There is __init_zone_device_page() and __init_single_page(), 
it does zero out the page and sets the zone, pfn, nid among other things.
I propose we use the current version with zone_device_free_folio() as is.

We can figure out if __init_zone_device_page() can be reused or refactored
for the purposes to doing this with core MM API's

quoted
This patch mixed the concept of page and folio together, thus
causing confusion. Core MM sees page and folio two separate things:
1. page is the smallest internal physical memory management unit,
2. folio is an abstraction on top of pages, and other abstractions can be
   slab, ptdesc, and more (https://kernelnewbies.org/MatthewWilcox/Memdescs).
I think the users of zone_device_page_init() are principally trying to
create something that can be installed in a non-special PTE. Meaning
the output is always a folio because it is going to be read as a folio
in the page walkers.

Thus, the job of this function is to take the memory range starting at
page for 2^order and turn it into a single valid folio with refcount
of 1.
quoted
If device pages have to initialize on top of pages with obsolete states,
at least it should be first initialized as pages, then as folios to avoid
confusion.
I don't think so. It should do the above job efficiently and iterate
over the page list exactly once.

Jason
Agreed

Balbir
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help