Re: [PATCH v6 1/5] mm/zone_device: Reinitialize large zone device private folios
From: Matthew Brost <matthew.brost@intel.com>
Date: 2026-01-16 20:31:34
Also in:
amd-gfx, dri-devel, intel-xe, kvm, linux-cxl, linux-mm, lkml, nouveau
On Fri, Jan 16, 2026 at 08:17:22PM +0100, Vlastimil Babka wrote:
On 1/16/26 18:49, Jason Gunthorpe wrote:quoted
On Fri, Jan 16, 2026 at 12:10:16PM +0100, Francois Dugast wrote:quoted
-void zone_device_page_init(struct page *page, unsigned int order) +void zone_device_page_init(struct page *page, struct dev_pagemap *pgmap, + unsigned int order) { + struct page *new_page = page; + unsigned int i; + VM_WARN_ON_ONCE(order > MAX_ORDER_NR_PAGES); + for (i = 0; i < (1UL << order); ++i, ++new_page) { + struct folio *new_folio = (struct folio *)new_page; + + /* + * new_page could have been part of previous higher order folio + * which encodes the order, in page + 1, in the flags bits. We + * blindly clear bits which could have set my order field here, + * including page head. + */ + new_page->flags.f &= ~0xffUL; /* Clear possible order, page head */ + +#ifdef NR_PAGES_IN_LARGE_FOLIO + /* + * This pointer math looks odd, but new_page could have been + * part of a previous higher order folio, which sets _nr_pages + * in page + 1 (new_page). Therefore, we use pointer casting to + * correctly locate the _nr_pages bits within new_page which + * could have modified by previous higher order folio. + */ + ((struct folio *)(new_page - 1))->_nr_pages = 0; +#endifThis seems too weird, why is it in the loop? There is only one _nr_pages per folio.I suppose we could be getting say an order-9 folio that was previously used as two order-8 folios? And each of them had their _nr_pages in their head
Yes, this is a good example. At this point we have idea what previous allocation(s) order(s) were - we could have multiple places in the loop where _nr_pages is populated, thus we have to clear this everywhere.
and we can't know that at this point so we have to reset everything?
Yes, see above, correct. We have no visablity to previous state of the pages so the only option is to reset everything.
AFAIU this would not be a problem if the clearing of the previous state was done upon freeing, as e.g. v4 did, but I think you also argued it meant processing the pages when freeing and then again at reallocation, so it's now like this instead?
Yes, if we cleanup the previous folio state upon freeing, then this problem goes away but the we back passing in the order as argument to ->folio_free().
Or maybe you mean that stray _nr_pages in some tail page from previous lifetimes can't affect the current lifetime in a wrong way for something looking at said page? I don't know immediately.quoted
This is mostly zeroing some memory in the tail pages? Why? Why can't this use the normal helpers, like memmap_init_compound()? struct folio *new_folio = page /* First 4 tail pages are part of struct folio */ for (i = 4; i < (1UL << order); i++) { prep_compound_tail(..) } prep_comound_head(page, order) new_folio->_nr_pages = 0 ??
I've beat this to death with Alistair, normal helpers do not work here. An order zero allocation could have _nr_pages set in its page, new_folio->_nr_pages is page + 1 memory. Matt
quoted
Jason