Re: [RFC v2 00/34] SLUB: reduce irq disabled scope and make it RT compatible
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date: 2021-07-02 18:29:50
Also in:
lkml
I replaced my slub changes with slub-local-lock-v2r3. I haven't seen any complains from lockdep or so which is good. Then I did this with RT enabled (and no debug): - A "time make -j32" run of allmodconfig on /dev/shm. Old: | real 20m6,217s | user 568m22,553s | sys 48m33,126s New: | real 20m9,049s | user 569m32,096s | sys 48m47,670s These 3 seconds here are probably in the noise range. - perf_5.10 stat -r 10 hackbench -g200 -s 4096 -l500 Old: | 464.967,20 msec task-clock # 27,220 CPUs utilized ( +- 0,16% ) | 7.683.944 context-switches # 0,017 M/sec ( +- 0,86% ) | 931.380 cpu-migrations # 0,002 M/sec ( +- 4,94% ) | 219.569 page-faults # 0,472 K/sec ( +- 0,39% ) | 1.104.727.599.918 cycles # 2,376 GHz ( +- 0,18% ) | 941.428.898.087 stalled-cycles-frontend # 85,22% frontend cycles idle ( +- 0,24% ) | 729.016.546.572 stalled-cycles-backend # 65,99% backend cycles idle ( +- 0,32% ) | 340.133.571.519 instructions # 0,31 insn per cycle | # 2,77 stalled cycles per insn ( +- 0,12% ) | 73.746.821.314 branches # 158,607 M/sec ( +- 0,13% ) | 377.838.006 branch-misses # 0,51% of all branches ( +- 1,01% ) | | 17,0820 +- 0,0202 seconds time elapsed ( +- 0,12% ) New: | 422.865,71 msec task-clock # 4,782 CPUs utilized ( +- 0,34% ) | 14.594.238 context-switches # 0,035 M/sec ( +- 0,43% ) | 3.737.926 cpu-migrations # 0,009 M/sec ( +- 0,46% ) | 218.474 page-faults # 0,517 K/sec ( +- 0,74% ) | 940.715.812.020 cycles # 2,225 GHz ( +- 0,34% ) | 716.593.827.820 stalled-cycles-frontend # 76,18% frontend cycles idle ( +- 0,39% ) | 550.730.862.839 stalled-cycles-backend # 58,54% backend cycles idle ( +- 0,43% ) | 417.274.588.907 instructions # 0,44 insn per cycle | # 1,72 stalled cycles per insn ( +- 0,17% ) | 92.814.150.290 branches # 219,488 M/sec ( +- 0,17% ) | 822.102.170 branch-misses # 0,89% of all branches ( +- 0,41% ) | | 88,427 +- 0,618 seconds time elapsed ( +- 0,70% ) So this is outside of the noise range. I'm not sure where this is coming from. My guess would be higher lock contention within the memory allocator.
The remaining patches to upstream from the RT tree are small ones related to KConfig. The patch that restricts PREEMPT_RT to SLUB (not SLAB or SLOB) makes sense. The patch that disables CONFIG_SLUB_CPU_PARTIAL with PREEMPT_RT could perhaps be re-evaluated as the series also addresses some latency issues with percpu partial slabs.
With that series the PARTIAL slab can be indeed enabled. I have (had) a half done series where I had PARTIAL enabled and noticed a slight increase in latency so made it "default y on !RT". It wasn't dramatic but appeared to be outside of noise. Sebastian