Re: [RFC v2 00/34] SLUB: reduce irq disabled scope and make it RT compatible
From: Vlastimil Babka <hidden>
Date: 2021-07-18 07:42:44
Also in:
lkml
On 7/3/21 9:24 AM, Mike Galbraith wrote:
On Fri, 2021-07-02 at 20:29 +0200, Sebastian Andrzej Siewior wrote:quoted
I replaced my slub changes with slub-local-lock-v2r3. I haven't seen any complains from lockdep or so which is good. Then I did this with RT enabled (and no debug):Below is some raw hackbench data from my little i4790 desktop box. It says we'll definitely still want list_lock to be raw.
Hi Mike, thanks a lot for the testing, sorry for late reply. Did you try, instead of raw list_lock, not applying the last, local lock patch, as I suggested in reply to bigeasy? I think the impact at reducing the RT-specific overhead would be larger (than raw list_lock), the result should still be RT compatible, and it would also deal with the bugs you found there... (which I'll look into). Thanks, Vlastimil
It also appears to be saying that there's something RT specific to
stare at in addition to the list_lock business, but add a pinch of salt
to that due to the config of the virgin(ish) tip tree being much
lighter than the enterprise(ish) config of the tip-rt tree.
perf stat -r10 hackbench -s4096 -l500
full warmup, record, repeat twice for elapsed
5.13.0.g60ab3ed-tip-rt
8,898.51 msec task-clock # 7.525 CPUs utilized ( +- 0.33% )
368,922 context-switches # 0.041 M/sec ( +- 5.20% )
42,281 cpu-migrations # 0.005 M/sec ( +- 5.28% )
13,180 page-faults # 0.001 M/sec ( +- 0.70% )
33,343,378,867 cycles # 3.747 GHz ( +- 0.30% )
21,656,783,887 instructions # 0.65 insn per cycle ( +- 0.67% )
4,408,569,663 branches # 495.428 M/sec ( +- 0.73% )
12,040,125 branch-misses # 0.27% of all branches ( +- 2.93% )
1.18260 +- 0.00473 seconds time elapsed ( +- 0.40% )
1.19018 +- 0.00441 seconds time elapsed ( +- 0.37% ) (repeat)
1.18260 +- 0.00473 seconds time elapsed ( +- 0.40% ) (repeat)
5.13.0.g60ab3ed-tip-rt +slub-local-lock-v2r3 list_lock=raw_spinlock_t
9,642.00 msec task-clock # 7.521 CPUs utilized ( +- 0.46% )
462,091 context-switches # 0.048 M/sec ( +- 4.79% )
44,411 cpu-migrations # 0.005 M/sec ( +- 4.34% )
12,980 page-faults # 0.001 M/sec ( +- 0.43% )
36,098,859,429 cycles # 3.744 GHz ( +- 0.44% )
25,462,853,462 instructions # 0.71 insn per cycle ( +- 0.50% )
5,260,898,360 branches # 545.623 M/sec ( +- 0.52% )
16,088,686 branch-misses # 0.31% of all branches ( +- 2.02% )
1.28207 +- 0.00568 seconds time elapsed ( +- 0.44% )
1.28744 +- 0.00713 seconds time elapsed ( +- 0.55% ) (repeat)
1.28085 +- 0.00850 seconds time elapsed ( +- 0.66% ) (repeat)
5.13.0.g60ab3ed-tip-rt +slub-local-lock-v2r3 list_lock=spinlock_t
10,004.89 msec task-clock # 6.029 CPUs utilized ( +- 1.37% )
654,311 context-switches # 0.065 M/sec ( +- 5.16% )
211,070 cpu-migrations # 0.021 M/sec ( +- 1.38% )
13,262 page-faults # 0.001 M/sec ( +- 0.79% )
36,585,914,931 cycles # 3.657 GHz ( +- 1.35% )
27,682,240,511 instructions # 0.76 insn per cycle ( +- 1.06% )
5,766,064,432 branches # 576.325 M/sec ( +- 1.11% )
24,269,069 branch-misses # 0.42% of all branches ( +- 2.03% )
1.6595 +- 0.0116 seconds time elapsed ( +- 0.70% )
1.6270 +- 0.0180 seconds time elapsed ( +- 1.11% ) (repeat)
1.6213 +- 0.0150 seconds time elapsed ( +- 0.93% ) (repeat)
virgin(ish) tip
5.13.0.g60ab3ed-tip
7,320.67 msec task-clock # 7.792 CPUs utilized ( +- 0.31% )
221,215 context-switches # 0.030 M/sec ( +- 3.97% )
16,234 cpu-migrations # 0.002 M/sec ( +- 4.07% )
13,233 page-faults # 0.002 M/sec ( +- 0.91% )
27,592,205,252 cycles # 3.769 GHz ( +- 0.32% )
8,309,495,040 instructions # 0.30 insn per cycle ( +- 0.37% )
1,555,210,607 branches # 212.441 M/sec ( +- 0.42% )
5,484,209 branch-misses # 0.35% of all branches ( +- 2.13% )
0.93949 +- 0.00423 seconds time elapsed ( +- 0.45% )
0.94608 +- 0.00384 seconds time elapsed ( +- 0.41% ) (repeat)
0.94422 +- 0.00410 seconds time elapsed ( +- 0.43% )
5.13.0.g60ab3ed-tip +slub-local-lock-v2r3
7,343.57 msec task-clock # 7.776 CPUs utilized ( +- 0.44% )
223,044 context-switches # 0.030 M/sec ( +- 3.02% )
16,057 cpu-migrations # 0.002 M/sec ( +- 4.03% )
13,164 page-faults # 0.002 M/sec ( +- 0.97% )
27,684,906,017 cycles # 3.770 GHz ( +- 0.45% )
8,323,273,871 instructions # 0.30 insn per cycle ( +- 0.28% )
1,556,106,680 branches # 211.901 M/sec ( +- 0.31% )
5,463,468 branch-misses # 0.35% of all branches ( +- 1.33% )
0.94440 +- 0.00352 seconds time elapsed ( +- 0.37% )
0.94830 +- 0.00228 seconds time elapsed ( +- 0.24% ) (repeat)
0.93813 +- 0.00440 seconds time elapsed ( +- 0.47% ) (repeat)