Re: [PATCH v2 2/2] powerpc/mm: Add memory_block_size as a kernel parameter
From: David Hildenbrand <hidden>
Date: 2023-06-20 12:55:54
On 20.06.23 14:35, Michael Ellerman wrote:
David Hildenbrand [off-list ref] writes:quoted
On 09.06.23 08:08, Aneesh Kumar K.V wrote:quoted
Certain devices can possess non-standard memory capacities, not constrained to multiples of 1GB. Provide a kernel parameter so that we can map the device memory completely on memory hotplug.So, the unfortunate thing is that these devices would have worked out of the box before the memory block size was increased from 256 MiB to 1 GiB in these setups. Now, one has to fine-tune the memory block size. The only other arch that I know, which supports setting the memory block size, is x86 for special (large) UV systems -- and at least in the past 128 MiB vs. 2 GiB memory blocks made a performance difference during boot (maybe no longer today, who knows). Obviously, less tunable and getting stuff simply working out of the box is preferable. Two questions: 1) Isn't there a way to improve auto-detection to fallback to 256 MiB in these setups, to avoid specifying these parameters? 2) Is the 256 MiB -> 1 GiB memory block size switch really worth it? On x86-64, experiments (with direct map fragmentation) showed that the effective performance boost is pretty insignificant, so I wonder how big the 1 GiB direct map performance improvement is.The other issue is simply the number of sysfs entries. With 64TB of memory and a 256MB block size you end up with ~250,000 directories in /sys/devices/system/memory.
Yes, and so far on other archs we only optimize for that for on UV x86 systems (with a default of 2 GiB). And that was added before we started to speed up memory device lookups significantly using a radix tree IIRC. It's worth noting that there was a discussion on: (a) not creating these device sysfs entries (when configured on the cmdline); often, nobody really ends up using them to online/offline memory blocks. Then, the only primary users is lsmem. (b) exposing logical devices (e.g., a DIMM) taht can only be offlined/removed as a whole, instead of their individual memblocks (when configured on the cmdline). But for PPC64 that won't help. But (a) gets more tricky if device drivers (and things like dax/kmem) rely on user-space memory onlining/offlining. -- Cheers, David / dhildenb