Thread (33 messages) 33 messages, 9 authors, 2020-08-18

Re: [PATCH v5 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline

From: David Hildenbrand <hidden>
Date: 2020-07-01 11:31:10
Also in: linux-mm, lkml

On 01.07.20 13:06, David Hildenbrand wrote:
On 01.07.20 13:01, Srikar Dronamraju wrote:
quoted
* David Hildenbrand [off-list ref] [2020-07-01 12:15:54]:
quoted
On 01.07.20 12:04, Srikar Dronamraju wrote:
quoted
* Michal Hocko [off-list ref] [2020-07-01 10:42:00]:
quoted
quoted
2. Also existence of dummy node also leads to inconsistent information. The
number of online nodes is inconsistent with the information in the
device-tree and resource-dump

3. When the dummy node is present, single node non-Numa systems end up showing
up as NUMA systems and numa_balancing gets enabled. This will mean we take
the hit from the unnecessary numa hinting faults.
I have to say that I dislike the node online/offline state and directly
exporting that to the userspace. Users should only care whether the node
has memory/cpus. Numa nodes can be online without any memory. Just
offline all the present memory blocks but do not physically hot remove
them and you are in the same situation. If users are confused by an
output of tools like numactl -H then those could be updated and hide
nodes without any memory&cpus.

The autonuma problem sounds interesting but again this patch doesn't
really solve the underlying problem because I strongly suspect that the
problem is still there when a numa node gets all its memory offline as
mentioned above.

While I completely agree that making node 0 special is wrong, I have
still hard time to review this very simply looking patch because all the
numa initialization is so spread around that this might just blow up
at unexpected places. IIRC we have discussed testing in the previous
version and David has provided a way to emulate these configurations
on x86. Did you manage to use those instruction for additional testing
on other than ppc architectures?
I have tried all the steps that David mentioned and reported back at
https://lore.kernel.org/lkml/20200511174731.GD1961@linux.vnet.ibm.com/t/#u (local)

As a summary, David's steps are still not creating a memoryless/cpuless on
x86 VM.
Now, that is wrong. You get a memoryless/cpuless node, which is *not
online*. Once you hotplug some memory, it will switch online. Once you
remove memory, it will switch back offline.
Let me clarify, we are looking for a node 0 which is cpuless/memoryless at
boot.  The code in question tries to handle a cpuless/memoryless node 0 at
boot.
I was just correcting your statement, because it was wrong.

Could be that x86 code maps PXM 1 to node 0 because PXM 1 does neither
have CPUs nor memory. That would imply that we can, in fact, never have
node 0 offline during boot.
Yep, looks like it.

[    0.009726] SRAT: PXM 1 -> APIC 0x00 -> Node 0
[    0.009727] SRAT: PXM 1 -> APIC 0x01 -> Node 0
[    0.009727] SRAT: PXM 1 -> APIC 0x02 -> Node 0
[    0.009728] SRAT: PXM 1 -> APIC 0x03 -> Node 0
[    0.009731] ACPI: SRAT: Node 0 PXM 1 [mem 0x00000000-0x0009ffff]
[    0.009732] ACPI: SRAT: Node 0 PXM 1 [mem 0x00100000-0xbfffffff]
[    0.009733] ACPI: SRAT: Node 0 PXM 1 [mem 0x100000000-0x13fffffff]



-- 
Thanks,

David / dhildenb
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help