Re: [PATCH v2] kernel/resource: Fix locking in request_free_mem_region
From: David Hildenbrand <hidden>
Date: 2021-03-31 06:41:53
Also in:
lkml
On 31.03.21 08:19, Alistair Popple wrote:
On Tuesday, 30 March 2021 8:13:32 PM AEDT David Hildenbrand wrote:quoted
External email: Use caution opening links or attachments On 29.03.21 03:37, Alistair Popple wrote:quoted
On Friday, 26 March 2021 7:57:51 PM AEDT David Hildenbrand wrote:quoted
On 26.03.21 02:20, Alistair Popple wrote:quoted
request_free_mem_region() is used to find an empty range of physical addresses for hotplugging ZONE_DEVICE memory. It does this by iterating over the range of possible addresses using region_intersects() to see if the range is free.Just a high-level question: how does this iteract with memory hot(un)plug? IOW, how defines and manages the "range of possible addresses" ?Both the driver and the maximum physical address bits available define the range of possible addresses for device private memory. From __request_free_mem_region(): end = min_t(unsigned long, base->end, (1UL << MAX_PHYSMEM_BITS) - 1); addr = end - size + 1UL; There is no lower address range bound here so it is effectively zero. Thecodequoted
quoted
will try to allocate the highest possible physical address first andcontinuequoted
quoted
searching down for a free block. Does that answer your question?Oh, sorry, the fist time I had a look I got it wrong - I thought (1UL << MAX_PHYSMEM_BITS) would be the lower address limit. That looks indeed problematic to me. You might end up reserving an iomem region that could be used e.g., by memory hotplug code later. If someone plugs a DIMM or adds memory via different approaches (virtio-mem), memory hotplug (via add_memory()) would fail. You never should be touching physical memory area reserved for memory hotplug, i.e., via SRAT. What is the expectation here?Most drivers call request_free_mem_region() with iomem_resource as the base. So zone device private pages currently tend to get allocated from the top of that.
Okay, but you could still "steal" iomem space that does not belong to you, and the firmware will be unaware of that (e.g., it might hotplug a DIMM in these spots). This is really nasty (although I guess as you allocate top down, it will happen rarely).
By definition ZONE_DEVICE private pages are unaddressable from the CPU. So in terms of expectation I think all that is really required for ZONE_DEVICE private pages (at least for Nouveau) is a valid range of physical addresses that allow page_to_pfn() and pfn_to_page() to work correctly. To make this work drivers add the pages via memremap_pages() -> pagemap_range() -> add_pages().
So you'd actually want some region above the hotpluggable/addressable range -- e.g., above MAX_PHYSMEM_BITS. The maximum number of sections we can have is define by #define SECTIONS_SHIFT (MAX_PHYSMEM_BITS - SECTION_SIZE_BITS) You'd e.g., want an extra space like (to be improved) #define DEVMEM_BITS 1 #define SECTIONS_SHIFT (MAX_PHYSMEM_BITS + DEVMEM_BITS - SECTION_SIZE_BITS) And do the search only within that range. -- Thanks, David / dhildenb