Thread (13 messages) 13 messages, 6 authors, 2024-01-22

Re: [PATCH] NUMA: Early use of cpu_to_node() returns 0 instead of the correct node id

From: Shijie Huang <hidden>
Date: 2024-01-22 07:32:49
Also in: linux-arm-kernel, linux-mips, linux-riscv, lkml

在 2024/1/20 2:02, Yury Norov 写道:
[EXTERNAL EMAIL NOTICE: This email originated from an external sender. Please be mindful of safe email handling and proprietary information protection practices.]


On Fri, Jan 19, 2024 at 04:50:53PM +0800, Shijie Huang wrote:
quoted
在 2024/1/19 16:42, Mike Rapoport 写道:
quoted
On Fri, Jan 19, 2024 at 02:46:16PM +0800, Shijie Huang wrote:
quoted
在 2024/1/19 12:42, Yury Norov 写道:
quoted
This adds another level of indirection, I think. Currently cpu_to_node
is a simple inliner. After the patch it would be a real function with
all the associate overhead. Can you share a bloat-o-meter output here?
#./scripts/bloat-o-meter vmlinux vmlinux.new
add/remove: 6/1 grow/shrink: 61/51 up/down: 1168/-588 (580)
Function                                     old     new   delta
numa_update_cpu                              148     244     +96

   ...................................................................................................................................(to many to skip)

Total: Before=32990130, After=32990710, chg +0.00%
It's not only about text size, the indirect call also hurts performance
The cpu_to_node() is called at very low frequency, most of the times is in
the kernel booting time.
That doesn't matter. This function is a simple inliner that dereferences
a pointer, and I believe all of us want to keep it simple.
Yes. I agree.

I also want to keep it simple too.

quoted
quoted
quoted
quoted
Regardless, I don't think that the approach is correct. As per your
description, some initialization functions erroneously call
cpu_to_node() instead of early_cpu_to_node() which exists specifically
for that case.

If the above correct, it's clearly a caller problem, and the fix is to
simply switch all those callers to use early version.
It is easy to change to early_cpu_to_node() for sched_init(),
init_sched_fair_class()

and workqueue_init_early(). These three places call the cpu_to_node() in the
__init function.


But it is a little hard to change the early_trace_init(), since it calls
cpu_to_node in the deep

function stack:

    early_trace_init() --> ring_buffer_alloc() -->rb_allocate_cpu_buffer()


For early_trace_init(), we need to change more code.


Anyway, If we think it is not a good idea to change the common code, I am
oaky too.
Is there a fundamental reason to have early_cpu_to_node() at all?
The early_cpu_to_node does not work on some ARCHs (which support the NUMA),
such

as  SPARC, MIPS and S390.
So, your approach wouldn't work either, right? I think you've got a
testing bot report on it already...
IMHO, my patch works fine for them.

They have their own cpu_to_node.


The x86 reported an compiling error, because the x86 does not compile

the driver/base/arch_numa.c.

I have fixed it by moving the cpu_to_node from

      driver/base/arch_numa.c to driver/base/node.c


The driver/base/node.c is  built-in for all the NUMA ARCHs.
You can make it like this:

   #ifdef CONFIG_ARCH_NO_EARLY_CPU_TO_NODE
   #define early_cpu_to_node cpu_to_node
   #endif
Thanks. Add this make it more complicated..

quoted
quoted
It seems that all the mappings are known by the end of setup_arch() and the
initialization of numa_node can be moved earlier.
quoted
quoted
I would also initialize the numa_node with NUMA_NO_NODE at declaration,
so that if someone calls cpu_to_node() before the variable is properly
initialized at runtime, he'll get NO_NODE, which is obviously an error.
Even we set the numa_node with NUMA_NO_NODE, it does not always produce
error.
You can print this error yourself:

   #ifndef cpu_to_node
   static inline int cpu_to_node(int cpu)
   {
         int node = per_cpu(numa_node, cpu);

   #ifdef CONFIG_DEBUG_PER_CPU_MAPS
         if (node == NUMA_NO_NODE)
                 pr_err(...);
   #endif

           return node;
   }
   #endif
Thanks.  I had a samiliar private to detect it.

After my patch, there is no need to detect the error again.


Thanks

Huang Shijie
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help