Re: [PATCH] NUMA: Early use of cpu_to_node() returns 0 instead of the correct node id
From: Shijie Huang <hidden>
Date: 2024-01-19 08:51:12
Also in:
linux-arm-kernel, linux-mips, linux-riscv, lkml
在 2024/1/19 16:42, Mike Rapoport 写道:
On Fri, Jan 19, 2024 at 02:46:16PM +0800, Shijie Huang wrote:quoted
在 2024/1/19 12:42, Yury Norov 写道:quoted
This adds another level of indirection, I think. Currently cpu_to_node is a simple inliner. After the patch it would be a real function with all the associate overhead. Can you share a bloat-o-meter output here?#./scripts/bloat-o-meter vmlinux vmlinux.new add/remove: 6/1 grow/shrink: 61/51 up/down: 1168/-588 (580) Function old new delta numa_update_cpu 148 244 +96 ...................................................................................................................................(to many to skip) Total: Before=32990130, After=32990710, chg +0.00%It's not only about text size, the indirect call also hurts performance
The cpu_to_node() is called at very low frequency, most of the times is in the kernel booting time.
quoted
quoted
Regardless, I don't think that the approach is correct. As per your description, some initialization functions erroneously call cpu_to_node() instead of early_cpu_to_node() which exists specifically for that case. If the above correct, it's clearly a caller problem, and the fix is to simply switch all those callers to use early version.It is easy to change to early_cpu_to_node() for sched_init(), init_sched_fair_class() and workqueue_init_early(). These three places call the cpu_to_node() in the __init function. But it is a little hard to change the early_trace_init(), since it calls cpu_to_node in the deep function stack: early_trace_init() --> ring_buffer_alloc() -->rb_allocate_cpu_buffer() For early_trace_init(), we need to change more code. Anyway, If we think it is not a good idea to change the common code, I am oaky too.Is there a fundamental reason to have early_cpu_to_node() at all?
The early_cpu_to_node does not work on some ARCHs (which support the NUMA), such as SPARC, MIPS and S390. Thanks Huang Shijie
It seems that all the mappings are known by the end of setup_arch() and the initialization of numa_node can be moved earlier.quoted
quoted
I would also initialize the numa_node with NUMA_NO_NODE at declaration, so that if someone calls cpu_to_node() before the variable is properly initialized at runtime, he'll get NO_NODE, which is obviously an error.Even we set the numa_node with NUMA_NO_NODE, it does not always produce error. Please see the alloc_pages_node(). Thanks Huang Shijie