Re: [PATCH v2 2/2] powerpc/mm: Add memory_block_size as a kernel parameter
From: David Hildenbrand <hidden>
Date: 2023-06-19 16:30:05
On 19.06.23 18:17, Aneesh Kumar K.V wrote:
David Hildenbrand [off-list ref] writes:quoted
On 09.06.23 08:08, Aneesh Kumar K.V wrote:quoted
Certain devices can possess non-standard memory capacities, not constrained to multiples of 1GB. Provide a kernel parameter so that we can map the device memory completely on memory hotplug.So, the unfortunate thing is that these devices would have worked out of the box before the memory block size was increased from 256 MiB to 1 GiB in these setups. Now, one has to fine-tune the memory block size. The only other arch that I know, which supports setting the memory block size, is x86 for special (large) UV systems -- and at least in the past 128 MiB vs. 2 GiB memory blocks made a performance difference during boot (maybe no longer today, who knows). Obviously, less tunable and getting stuff simply working out of the box is preferable. Two questions: 1) Isn't there a way to improve auto-detection to fallback to 256 MiB in these setups, to avoid specifying these parameters?The patch does try to detect as much as possible by looking at device tree nodes and aperture window size. But there are still cases where we find a memory aperture of size X GB and device driver hotplug X.YGB memory.
Okay, and I assume we can't detect that case easily. Which interface is that device driver using to hotplug memory? It's quite surprising I have to say ...
quoted
2) Is the 256 MiB -> 1 GiB memory block size switch really worth it? On x86-64, experiments (with direct map fragmentation) showed that the effective performance boost is pretty insignificant, so I wonder how big the 1 GiB direct map performance improvement is.Tarun is running some tests to evaluate the impact. We used to use 1GiB mapping always. This was later switched to use memory block size to fix issues with memory unplug commit af9d00e93a4f ("powerpc/mm/radix: Create separate mappings for hot-plugged memory") explains some details related to that change.
IIUC, that commit (conditionally) increased the memory block size to avoid the splitting, correct? By that, it broke the device driver use case.
quoted
I guess the only real issue with 256 MiB memory blocks and 1 GiB direct mapping is memory unplug of boot memory: when unplugging a 256 MiB block, one would have to remap the 1 GiB range using 2 MiB ranges.quoted
... I was wondering what would happen if you simply leave the direct mapping in this corner case in place instead of doing this remapping. IOW, remove the memory but keep the direct map pointing at the removed memory. Nobody should be touching it, or are there any cases where that could hurt? Or is there any other reason why we really want 1 GiB memory blocks instead of to defaulting to 256 MiB the way it used to be?The idea we are working towards is to keep the memory block size small
That would be preferable, yes ...
but map the boot memory using 1G. An unplug request can split that 1G mapping later. We could look at the possibility of leaving that mapping without splitting. But not sure why we would want to do that if we can correctly split things. Right now there is no splitting support in powerpc.
If splitting over-complicates the matter (and well, it will even consume more memory), it might at least be worth looking into that. Yes, it's cleaner. I think there is also the option to fail memory offlining (and therefore unplug) if we have a 1 GiB mapping and don't want to split. For hotplugged memory it would always work to unplug again. aarch64 blocks any boot memory from getting unplugged. But I guess that might break existing use cases (unplug boot memory) on ppc64 that rely on ZONE_MOVABLE to have it working with guarantees, right? Could be optimized but not sure if that's the best approach. -- Cheers, David / dhildenb