Re: [PATCH v1] memory-hotplug.rst: complete admin-guide overhaul
From: David Hildenbrand <hidden>
Date: 2021-06-08 13:04:37
Also in:
linux-mm, lkml
quoted
+ZONE_MOVABLE +============ + +ZONE_MOVABLE is an important mechanism for more reliable memory offlining. +Further, having system RAM managed by ZONE_MOVABLE instead of one of the +kernel zones can increase the number of possible transparent huge pages and +dynamically allocated huge pages. +I'd move the first two paragraphs from "Zone Imbalances" here to provide some context what is movable and what is unmovable allocation.
Makes sense. [...]
quoted
-How to offline memory ---------------------- +ConsiderationsZONE_MOVABLE Sizing Considerations ?
Ack
I'd also move the contents of "Boot Memory and ZONE_MOVABLE" here (with some adjustments): By default, all the memory configured at boot time is managed by the kernel zones and ZONE_MOVABLE is not used. To enable ZONE_MOVABLE to include the memory present at boot and to control the ratio between movable and kernel zones there are two command line options: ``kernelcore=`` and ``movablecore=``. See Documentation/admin-guide/kernel-parameters.rst for their description.
Makes sense. I'll move it to the end of the "ZONE_MOVABLE Sizing Considerations" section.
quoted
+-------------- -You can offline a memory block by using the same sysfs interface that was used -in memory onlining:: +We usually expect that a large portion of available system RAM will actually +be consumed by user space, either directly or indirectly via the page cache. In +the normal case, ZONE_MOVABLE can be used when allocating such pages just fine. - % echo offline > /sys/devices/system/memory/memoryXXX/state +With that in mind, it makes sense that we can have a big portion of system RAM +managed by ZONE_MOVABLE. However, there are some things to consider when +using ZONE_MOVABLE, especially when fine-tuning zone ratios: -If offline succeeds, the state of the memory block is changed to be "offline". -If it fails, some error core (like -EBUSY) will be returned by the kernel. -Even if a memory block does not belong to ZONE_MOVABLE, you can try to offline -it. If it doesn't contain 'unmovable' memory, you'll get success. +- Having a lot of offline memory blocks. Even offline memory blocks consume + memory for metadata and page tables in the direct map; having a lot of + offline memory blocks is not a typical case, though. + +- Memory ballooning. Some memory ballooning implementations, such as + the Hyper-V balloon, the XEN balloon, the vbox balloon and the VMWareSo, everyone except virtio-mem? ;-)
Well, virtio-mem does not classify as memory balloon in that sense, as it only operates on own device memory ;) virtio-balloon and pseries CMM support balloon compaction.
I'd drop the names because if some of those will implement balloon compaction they surely will forget to update the docs.
I can do the opposite and mention the ones that already do. Some most probably will never support it. "Memory ballooning without balloon compaction is incompatible with ZONE_MOVABLE. Only some implementations, such as virtio-balloon and pseries CMM, fully support balloon compaction."
quoted
+ balloon with huge pages don't support balloon compaction and, thereby + ZONE_MOVABLE. + + Further, CONFIG_BALLOON_COMPACTION might be disabled. In that case, balloon + inflation will only perform unmovable allocations and silently create a + zone imbalance, usually triggered by inflation requests from the + hypervisor. + +- Gigantic pages are unmovable, resulting in user space consuming a + lot of unmovable memory. + +- Huge pages are unmovable when an architectures does not support huge + page migration, resulting in a similar issue as with gigantic pages. + +- Page tables are unmovable. Excessive swapping, mapping extremely large + files or ZONE_DEVICE memory can be problematic, although only + really relevant in corner cases. When we manage a lot of user space memory + that has been swapped out or is served from a file/pmem/... we still need^ persistent memory
Agreed.
quoted
+ a lot of page tables to manage that memory once user space accessed that + memory once. + +- DAX: when we have a lot of ZONE_DEVICE memory added to the system as DAX + and we are not using an altmap to allocate the memmap from device memory + directly, we will have to allocate the memmap for this memory from the + kernel zones.I'm not sure admin-guide reader will know when we use altmap when we don't. Maybe DAX: in certain DAX configurations the memory map for the device memory will be allocated from the kernel zones.
Indeed, simpler and communicates the same message. I'll also add "KASAN can have a significant memory overhead, for example, consuming 1/8th of the total system memory size as (unmovable) tracking metadata." Thanks Mike! -- Thanks, David / dhildenb