Re: [PATCH] NUMA: Early use of cpu_to_node() returns 0 instead of the correct node id
From: Shijie Huang <hidden>
Date: 2024-01-22 07:32:49
Also in:
linux-arm-kernel, linux-mips, linux-riscv, lkml
在 2024/1/20 2:02, Yury Norov 写道:
[EXTERNAL EMAIL NOTICE: This email originated from an external sender. Please be mindful of safe email handling and proprietary information protection practices.] On Fri, Jan 19, 2024 at 04:50:53PM +0800, Shijie Huang wrote:quoted
在 2024/1/19 16:42, Mike Rapoport 写道:quoted
On Fri, Jan 19, 2024 at 02:46:16PM +0800, Shijie Huang wrote:quoted
在 2024/1/19 12:42, Yury Norov 写道:quoted
This adds another level of indirection, I think. Currently cpu_to_node is a simple inliner. After the patch it would be a real function with all the associate overhead. Can you share a bloat-o-meter output here?#./scripts/bloat-o-meter vmlinux vmlinux.new add/remove: 6/1 grow/shrink: 61/51 up/down: 1168/-588 (580) Function old new delta numa_update_cpu 148 244 +96 ...................................................................................................................................(to many to skip) Total: Before=32990130, After=32990710, chg +0.00%It's not only about text size, the indirect call also hurts performanceThe cpu_to_node() is called at very low frequency, most of the times is in the kernel booting time.That doesn't matter. This function is a simple inliner that dereferences a pointer, and I believe all of us want to keep it simple.
Yes. I agree. I also want to keep it simple too.
quoted
quoted
quoted
quoted
Regardless, I don't think that the approach is correct. As per your description, some initialization functions erroneously call cpu_to_node() instead of early_cpu_to_node() which exists specifically for that case. If the above correct, it's clearly a caller problem, and the fix is to simply switch all those callers to use early version.It is easy to change to early_cpu_to_node() for sched_init(), init_sched_fair_class() and workqueue_init_early(). These three places call the cpu_to_node() in the __init function. But it is a little hard to change the early_trace_init(), since it calls cpu_to_node in the deep function stack: early_trace_init() --> ring_buffer_alloc() -->rb_allocate_cpu_buffer() For early_trace_init(), we need to change more code. Anyway, If we think it is not a good idea to change the common code, I am oaky too.Is there a fundamental reason to have early_cpu_to_node() at all?The early_cpu_to_node does not work on some ARCHs (which support the NUMA), such as SPARC, MIPS and S390.So, your approach wouldn't work either, right? I think you've got a testing bot report on it already...
IMHO, my patch works fine for them. They have their own cpu_to_node. The x86 reported an compiling error, because the x86 does not compile the driver/base/arch_numa.c. I have fixed it by moving the cpu_to_node from driver/base/arch_numa.c to driver/base/node.c The driver/base/node.c is built-in for all the NUMA ARCHs.
You can make it like this: #ifdef CONFIG_ARCH_NO_EARLY_CPU_TO_NODE #define early_cpu_to_node cpu_to_node #endif
Thanks. Add this make it more complicated..
quoted
quoted
It seems that all the mappings are known by the end of setup_arch() and the initialization of numa_node can be moved earlier.quoted
quoted
I would also initialize the numa_node with NUMA_NO_NODE at declaration, so that if someone calls cpu_to_node() before the variable is properly initialized at runtime, he'll get NO_NODE, which is obviously an error.Even we set the numa_node with NUMA_NO_NODE, it does not always produce error.You can print this error yourself: #ifndef cpu_to_node static inline int cpu_to_node(int cpu) { int node = per_cpu(numa_node, cpu); #ifdef CONFIG_DEBUG_PER_CPU_MAPS if (node == NUMA_NO_NODE) pr_err(...); #endif return node; } #endif
Thanks. I had a samiliar private to detect it. After my patch, there is no need to detect the error again. Thanks Huang Shijie