Re: [PATCH] NUMA: Early use of cpu_to_node() returns 0 instead of the correct node id

From: Shijie Huang <hidden>
Date: 2024-01-19 08:51:12
Also in: linux-arm-kernel, linux-mips, linux-riscv, lkml

在 2024/1/19 16:42, Mike Rapoport 写道:

On Fri, Jan 19, 2024 at 02:46:16PM +0800, Shijie Huang wrote:

quoted

在 2024/1/19 12:42, Yury Norov 写道:

quoted

This adds another level of indirection, I think. Currently cpu_to_node
is a simple inliner. After the patch it would be a real function with
all the associate overhead. Can you share a bloat-o-meter output here?

#./scripts/bloat-o-meter vmlinux vmlinux.new
add/remove: 6/1 grow/shrink: 61/51 up/down: 1168/-588 (580)
Function                                     old     new   delta
numa_update_cpu                              148     244     +96

  ...................................................................................................................................(to many to skip)

Total: Before=32990130, After=32990710, chg +0.00%

  
It's not only about text size, the indirect call also hurts performance

The cpu_to_node() is called at very low frequency, most of the times is 
in the kernel booting time.

quoted

Regardless, I don't think that the approach is correct. As per your
description, some initialization functions erroneously call
cpu_to_node() instead of early_cpu_to_node() which exists specifically
for that case.

If the above correct, it's clearly a caller problem, and the fix is to
simply switch all those callers to use early version.

It is easy to change to early_cpu_to_node() for sched_init(),
init_sched_fair_class()

and workqueue_init_early(). These three places call the cpu_to_node() in the
__init function.


But it is a little hard to change the early_trace_init(), since it calls
cpu_to_node in the deep

function stack:

   early_trace_init() --> ring_buffer_alloc() -->rb_allocate_cpu_buffer()


For early_trace_init(), we need to change more code.


Anyway, If we think it is not a good idea to change the common code, I am
oaky too.

  
Is there a fundamental reason to have early_cpu_to_node() at all?

The early_cpu_to_node does not work on some ARCHs (which support the 
NUMA), such

as  SPARC, MIPS and S390.


Thanks

Huang Shijie

It seems that all the mappings are known by the end of setup_arch() and the
initialization of numa_node can be moved earlier.

quoted

I would also initialize the numa_node with NUMA_NO_NODE at declaration,
so that if someone calls cpu_to_node() before the variable is properly
initialized at runtime, he'll get NO_NODE, which is obviously an error.

Even we set the numa_node with NUMA_NO_NODE, it does not always produce
error.

Please see the alloc_pages_node().


Thanks

Huang Shijie

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help