Thread (36 messages) 36 messages, 10 authors, 2021-02-10

Re: [RFC PATCH v0] mm/slub: Let number of online CPUs determine the slub page order

From: Vincent Guittot <vincent.guittot@linaro.org>
Date: 2021-01-22 08:04:56
Also in: lkml

On Thu, 21 Jan 2021 at 19:19, Vlastimil Babka [off-list ref] wrote:
On 1/21/21 11:01 AM, Christoph Lameter wrote:
quoted
On Thu, 21 Jan 2021, Bharata B Rao wrote:
quoted
quoted
The problem is that calculate_order() is called a number of times
before secondaries CPUs are booted and it returns 1 instead of 224.
This makes the use of num_online_cpus() irrelevant for those cases

After adding in my command line "slub_min_objects=36" which equals to
4 * (fls(num_online_cpus()) + 1) with a correct num_online_cpus == 224
, the regression diseapears:

9 iterations of hackbench -l 16000 -g 16: 3.201sec (+/- 0.90%)
I'm surprised that hackbench is that sensitive to slab performance, anyway. It's
supposed to be a scheduler benchmark? What exactly is going on?
From hackbench description:
Hackbench is both a benchmark and a stress test for the Linux kernel
scheduler. It's  main
       job  is  to  create a specified number of pairs of schedulable
entities (either threads or
       traditional processes) which communicate via either sockets or
pipes and time how long  it
       takes for each pair to send data back and forth.
quoted
quoted
Should we have switched to num_present_cpus() rather than
num_online_cpus()? If so, the below patch should address the
above problem.
There is certainly an initcall after secondaries are booted where we could
redo the calculate_order?
We could do it even in hotplug handler. But in practice that means making sure
it's safe, i.e. all users of oo_order/oo_objects must handle the value changing.

Consider e.g. init_cache_random_seq() which uses oo_objects(s->oo) to allocate
s->random_seq when cache s is created. Then shuffle_freelist() will use the
current value of oo_objects(s->oo) to index s->random_seq, for a newly allocated
slab - what if the page order has increased meanwhile due to secondary booting
or hotplug? Array overflow. That's why I just made the former sysfs handler for
changing order read-only.

Things would be easier if we could trust *on all arches* either

- num_present_cpus() to count what the hardware really physically has during
boot, even if not yet onlined, at the time we init slab. This would still not
handle later hotplug (probably mostly in a VM scenario, not that somebody would
bring bunch of actual new cpu boards to a running bare metal system?).

- num_possible_cpus()/nr_cpu_ids not to be excessive (broken BIOS?) on systems
where it's not really possible to plug more CPU's. In a VM scenario we could
still have an opposite problem, where theoretically "anything is possible" but
the virtual cpus are never added later.
On all the system that I have tested num_possible_cpus()/nr_cpu_ids
were correctly initialized

large arm64 acpi system
small arm64 DT based system
VM on x86 system
We could also start questioning the very assumption that number of cpus should
affect slab page size in the first place. Should it? After all, each CPU will
have one or more slab pages privately cached, as we discuss in the other
thread... So why make the slab pages also larger?
quoted
Or the num_online_cpus needs to be up to date earlier. Why does this issue
not occur on x86? Does x86 have an up to date num_online_cpus earlier?
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help