Thread (13 messages) 13 messages, 3 authors, 2021-01-12

Re: [PATCH] mm: Teach pfn_to_online_page() about ZONE_DEVICE section collisions

From: David Hildenbrand <hidden>
Date: 2021-01-12 09:45:52
Also in: lkml

On 12.01.21 10:18, Dan Williams wrote:
On Thu, Jan 7, 2021 at 1:16 AM David Hildenbrand [off-list ref] wrote:
quoted
[...]
quoted
quoted
quoted
Well, I would love to have no surprises either. So far there was not
actual argument why the pmem reserved space cannot be fully initialized.
Yes, I'm still hoping Dan can clarify that.
Complexity and effective utility (once pfn_to_online_page() is fixed)
are the roadblocks in my mind. The altmap is there to allow for PMEM
capacity to be used as memmap space, so there would need to be code to
break that circular dependency and allocate a memmap for the metadata
space from DRAM and the rest of the memmap space for the data capacity
from pmem itself. That memmap-for-pmem-metadata will still represent
offline pages. So once pfn_to_online_page() is fixed, what pfn-walker
is going to be doing pfn_to_page() on PMEM metadata? Secondly, there
Assume I do

pgmap = get_dev_pagemap(pfn, NULL);
if (pgmap)
        return pfn_to_page(pfn);
return NULL;

on a random pfn because I want to inspect ZONE_DEVICE PFNs.
I keep getting hung up on the motivation to do random pfn inspection?

The problems we have found to date have required different solutions.
The KVM bug didn't use get_dev_pagemap() to inspect the pfn because it
could rely on the fact that the page already had an elevated reference
count. The get_user_pages() path only looks up ZONE_DEVICE pfns when
it see {pte,pmd,pud}_devmap set in the page table entry. pfn walkers
have been a problem, but with pfn_to_online_page() fixed what is the
remaining motivation to inspect ZONE_DEVICE pfns?
1) Let's assume we want to implement zone shrinking
(remove_pfn_range_from_zone()->shrink_zone_span()) for ZONE_DEVICE at
some point.

A simple approach would be going via get_dev_pagemap(pfn,
NULL)->pfn_to_page(pfn), checking for the zone.

If that's not possible, then extending dev_pagemap (e.g., indicating the
nid) might also work (unless there is another way to get the nid).


2) Let's take a look at mm/memory-failure.c:memory_failure_dev_pagemap()

IIUC, we might end up doing pfn_to_page(pfn) on a pfn in the reserved
altmap space, so one with an uninitialized memmap.

E.g., in dax_lock_page() we access page->mapping, which might just be
garbage. dax_mapping() will de-reference garbage.

Most probably I am missing something here.



Question is: what are the expectations regarding the memmap if
get_dev_pagemap() succeeded.

I'm fine documenting that "get_dev_pagemap() does not guarantee that the
"struct page" returned by pfn_to_page() was initialized and can safely
be used. E.g., it might be a pfn in the reserved altmap space, for which
the memmap is never initialized. Accessing it might be dangerous.".

Then, there has to be a check at relevant places (e.g.,
memory_failure_dev_pagemap()), checking somehow if the memmap content
can actually be used.

-- 
Thanks,

David / dhildenb

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help