Re: [RFC v2 00/34] SLUB: reduce irq disabled scope and make it RT compatible

[RFC v2 00/34] SLUB: reduce irq disabled scope and make it RT compatible · Vlastimil Babka <hidden> · 2021-06-09
[RFC v2 05/34] mm, slub: remove redundant unfreeze_partials() from put_cpu_partial() · Vlastimil Babka <hidden> · 2021-06-09
[RFC v2 02/34] mm, slub: allocate private object map for sysfs listings · Vlastimil Babka <hidden> · 2021-06-09
Re: [RFC v2 02/34] mm, slub: allocate private object map for sysfs listings · Christoph Lameter <hidden> · 2021-06-09
[RFC v2 01/34] mm, slub: don't call flush_all() from list_locations() · Vlastimil Babka <hidden> · 2021-06-09
[RFC v2 04/34] mm, slub: don't disable irq for debug_check_no_locks_freed() · Vlastimil Babka <hidden> · 2021-06-09
[RFC v2 03/34] mm, slub: allocate private object map for validate_slab_cache() · Vlastimil Babka <hidden> · 2021-06-09
[RFC v2 06/34] mm, slub: unify cmpxchg_double_slab() and __cmpxchg_double_slab() · Vlastimil Babka <hidden> · 2021-06-09
[RFC v2 07/34] mm, slub: extract get_partial() from new_slab_objects() · Vlastimil Babka <hidden> · 2021-06-09
[RFC v2 08/34] mm, slub: dissolve new_slab_objects() into ___slab_alloc() · Vlastimil Babka <hidden> · 2021-06-09
[RFC v2 09/34] mm, slub: return slab page from get_partial() and set c->page afterwards · Vlastimil Babka <hidden> · 2021-06-09
[RFC v2 10/34] mm, slub: restructure new page checks in ___slab_alloc() · Vlastimil Babka <hidden> · 2021-06-09
[RFC v2 12/34] mm, slub: move disabling/enabling irqs to ___slab_alloc() · Vlastimil Babka <hidden> · 2021-06-09
Re: [RFC v2 12/34] mm, slub: move disabling/enabling irqs to ___slab_alloc() · Mike Galbraith <hidden> · 2021-07-06
[RFC v2 16/34] mm, slub: validate slab from partial list or page allocator before making it cpu slab · Vlastimil Babka <hidden> · 2021-06-09
[RFC v2 11/34] mm, slub: simplify kmem_cache_cpu and tid setup · Vlastimil Babka <hidden> · 2021-06-09
[RFC v2 18/34] mm, slub: stop disabling irqs around get_partial() · Vlastimil Babka <hidden> · 2021-06-09
[RFC v2 20/34] mm, slub: make locking in deactivate_slab() irq-safe · Vlastimil Babka <hidden> · 2021-06-09
[RFC v2 21/34] mm, slub: call deactivate_slab() without disabling irqs · Vlastimil Babka <hidden> · 2021-06-09
[RFC v2 14/34] mm, slub: move disabling irqs closer to get_partial() in ___slab_alloc() · Vlastimil Babka <hidden> · 2021-06-09
[RFC v2 13/34] mm, slub: do initial checks in ___slab_alloc() with irqs enabled · Vlastimil Babka <hidden> · 2021-06-09
[RFC v2 17/34] mm, slub: check new pages with restored irqs · Vlastimil Babka <hidden> · 2021-06-09
[RFC v2 22/34] mm, slub: move irq control into unfreeze_partials() · Vlastimil Babka <hidden> · 2021-06-09
[RFC v2 24/34] mm, slub: detach whole partial list at once in unfreeze_partials() · Vlastimil Babka <hidden> · 2021-06-09
[RFC v2 23/34] mm, slub: discard slabs in unfreeze_partials() without irqs disabled · Vlastimil Babka <hidden> · 2021-06-09
[RFC v2 19/34] mm, slub: move reset of c->page and freelist out of deactivate_slab() · Vlastimil Babka <hidden> · 2021-06-09
[RFC v2 28/34] mm, slab: make flush_slab() possible to call with irqs enabled · Vlastimil Babka <hidden> · 2021-06-09
[RFC v2 27/34] mm, slub: don't disable irqs in slub_cpu_dead() · Vlastimil Babka <hidden> · 2021-06-09
[RFC v2 15/34] mm, slub: restore irqs around calling new_slab() · Vlastimil Babka <hidden> · 2021-06-09
[RFC v2 25/34] mm, slub: detach percpu partial list in unfreeze_partials() using this_cpu_cmpxchg() · Vlastimil Babka <hidden> · 2021-06-09
[RFC v2 29/34] mm: slub: Move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context · Vlastimil Babka <hidden> · 2021-06-09
Re: [RFC v2 29/34] mm: slub: Move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context · Cyrill Gorcunov <hidden> · 2021-06-09
Re: [RFC v2 29/34] mm: slub: Move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context · Vlastimil Babka <hidden> · 2021-06-10
Re: [RFC v2 29/34] mm: slub: Move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context · Cyrill Gorcunov <hidden> · 2021-06-10
Re: [RFC v2 29/34] mm: slub: Move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context · Hillf Danton <hidden> · 2021-07-07
[RFC v2 30/34] mm: slub: Make object_map_lock a raw_spinlock_t · Vlastimil Babka <hidden> · 2021-06-09
[RFC v2 31/34] mm, slub: optionally save/restore irqs in slab_[un]lock()/ · Vlastimil Babka <hidden> · 2021-06-09
Re: [RFC v2 31/34] mm, slub: optionally save/restore irqs in slab_[un]lock()/ · Sebastian Andrzej Siewior <bigeasy@linutronix.de> · 2021-07-02
[RFC v2 33/34] mm, slub: use migrate_disable() on PREEMPT_RT · Vlastimil Babka <hidden> · 2021-06-09
Re: [RFC v2 33/34] mm, slub: use migrate_disable() on PREEMPT_RT · Vlastimil Babka <hidden> · 2021-06-14
Re: [RFC v2 33/34] mm, slub: use migrate_disable() on PREEMPT_RT · Sebastian Andrzej Siewior <bigeasy@linutronix.de> · 2021-06-14
Re: [RFC v2 33/34] mm, slub: use migrate_disable() on PREEMPT_RT · Vlastimil Babka <hidden> · 2021-06-14
Re: [RFC v2 33/34] mm, slub: use migrate_disable() on PREEMPT_RT · Vlastimil Babka <hidden> · 2021-06-14
Re: [RFC v2 33/34] mm, slub: use migrate_disable() on PREEMPT_RT · Sebastian Andrzej Siewior <bigeasy@linutronix.de> · 2021-06-14
[RFC v2 32/34] mm, slub: make slab_lock() disable irqs with PREEMPT_RT · Vlastimil Babka <hidden> · 2021-06-09
[RFC v2 34/34] mm, slub: convert kmem_cpu_slab protection to local_lock · Vlastimil Babka <hidden> · 2021-06-09
[RFC v2 26/34] mm, slub: only disable irq with spin_lock in __unfreeze_partials() · Vlastimil Babka <hidden> · 2021-06-09
Re: [RFC v2 00/34] SLUB: reduce irq disabled scope and make it RT compatible · Mel Gorman <hidden> · 2021-06-14
Re: [RFC v2 00/34] SLUB: reduce irq disabled scope and make it RT compatible · Mel Gorman <hidden> · 2021-06-14
Re: [RFC v2 00/34] SLUB: reduce irq disabled scope and make it RT compatible · Vlastimil Babka <hidden> · 2021-06-14
Re: [RFC v2 00/34] SLUB: reduce irq disabled scope and make it RT compatible · Sebastian Andrzej Siewior <bigeasy@linutronix.de> · 2021-07-02
Re: [RFC v2 00/34] SLUB: reduce irq disabled scope and make it RT compatible · Vlastimil Babka <hidden> · 2021-07-02
Re: [RFC v2 00/34] SLUB: reduce irq disabled scope and make it RT compatible · Sebastian Andrzej Siewior <bigeasy@linutronix.de> · 2021-07-29
Re: [RFC v2 00/34] SLUB: reduce irq disabled scope and make it RT compatible · Vlastimil Babka <hidden> · 2021-07-29
Re: [RFC v2 00/34] SLUB: reduce irq disabled scope and make it RT compatible · Sebastian Andrzej Siewior <bigeasy@linutronix.de> · 2021-07-29
Re: [RFC v2 00/34] SLUB: reduce irq disabled scope and make it RT compatible · Mike Galbraith <hidden> · 2021-07-03
Re: [RFC v2 00/34] SLUB: reduce irq disabled scope and make it RT compatible · Mike Galbraith <hidden> · 2021-07-03
Re: [RFC v2 00/34] SLUB: reduce irq disabled scope and make it RT compatible · Mike Galbraith <hidden> · 2021-07-04
Re: [RFC v2 00/34] SLUB: reduce irq disabled scope and make it RT compatible · Vlastimil Babka <hidden> · 2021-07-18
Re: [RFC v2 00/34] SLUB: reduce irq disabled scope and make it RT compatible · Mike Galbraith <hidden> · 2021-07-18
Re: [RFC v2 00/34] SLUB: reduce irq disabled scope and make it RT compatible · Mike Galbraith <hidden> · 2021-07-18
Re: [RFC v2 00/34] SLUB: reduce irq disabled scope and make it RT compatible · Mike Galbraith <hidden> · 2021-07-05
Re: [RFC v2 00/34] SLUB: reduce irq disabled scope and make it RT compatible · Mike Galbraith <hidden> · 2021-07-06

From: Vlastimil Babka <hidden>
Date: 2021-07-02 20:25:36
Also in: lkml

On 7/2/21 8:29 PM, Sebastian Andrzej Siewior wrote:

I replaced my slub changes with slub-local-lock-v2r3.
I haven't seen any complains from lockdep or so which is good. Then I
did this with RT enabled (and no debug):

Thanks for testing!

- A "time make -j32" run of allmodconfig on /dev/shm.
  Old:
| real    20m6,217s
| user    568m22,553s
| sys     48m33,126s

  New:
| real    20m9,049s
| user    569m32,096s
| sys     48m47,670s

  These 3 seconds here are probably in the noise range.

- perf_5.10 stat -r 10 hackbench -g200 -s 4096 -l500
Old:
|         464.967,20 msec task-clock                #   27,220 CPUs utilized            ( +-  0,16% )
|          7.683.944      context-switches          #    0,017 M/sec                    ( +-  0,86% )
|            931.380      cpu-migrations            #    0,002 M/sec                    ( +-  4,94% )
|            219.569      page-faults               #    0,472 K/sec                    ( +-  0,39% )
|  1.104.727.599.918      cycles                    #    2,376 GHz                      ( +-  0,18% )
|    941.428.898.087      stalled-cycles-frontend   #   85,22% frontend cycles idle     ( +-  0,24% )
|    729.016.546.572      stalled-cycles-backend    #   65,99% backend cycles idle      ( +-  0,32% )
|    340.133.571.519      instructions              #    0,31  insn per cycle
|                                                   #    2,77  stalled cycles per insn  ( +-  0,12% )
|     73.746.821.314      branches                  #  158,607 M/sec                    ( +-  0,13% )
|        377.838.006      branch-misses             #    0,51% of all branches          ( +-  1,01% )
| 
|            17,0820 +- 0,0202 seconds time elapsed  ( +-  0,12% )

New:
|         422.865,71 msec task-clock                #    4,782 CPUs utilized            ( +-  0,34% )
|         14.594.238      context-switches          #    0,035 M/sec                    ( +-  0,43% )
|          3.737.926      cpu-migrations            #    0,009 M/sec                    ( +-  0,46% )
|            218.474      page-faults               #    0,517 K/sec                    ( +-  0,74% )
|    940.715.812.020      cycles                    #    2,225 GHz                      ( +-  0,34% )
|    716.593.827.820      stalled-cycles-frontend   #   76,18% frontend cycles idle     ( +-  0,39% )
|    550.730.862.839      stalled-cycles-backend    #   58,54% backend cycles idle      ( +-  0,43% )
|    417.274.588.907      instructions              #    0,44  insn per cycle
|                                                   #    1,72  stalled cycles per insn  ( +-  0,17% )
|     92.814.150.290      branches                  #  219,488 M/sec                    ( +-  0,17% )
|        822.102.170      branch-misses             #    0,89% of all branches          ( +-  0,41% )
| 
|             88,427 +- 0,618 seconds time elapsed  ( +-  0,70% )

So this is outside of the noise range.
I'm not sure where this is coming from. My guess would be higher lock
contention within the memory allocator.

The series shouldn't significantly change the memory allocator
interaction, though.
Seems there's less cycles, but more time elapsed, thus more sleeping -
is it locks becoming mutexes on RT?

My first guess - the last, local_lock patch. What would happen if you
take that one out? Should be still RT-compatible. If it improves a lot,
maybe that conversion to local_lock is not worth it then.

My second guess - list_lock remains spinlock with my series, thus RT
mutex, but the current RT tree converts it to raw_spinlock. I'd hope
leaving that one as non-raw spinlock would still be much better for RT
goals, even if hackbench (which is AFAIK very slab intensive) throughput
regresses - hopefully not that much.

quoted

The remaining patches to upstream from the RT tree are small ones related to
KConfig. The patch that restricts PREEMPT_RT to SLUB (not SLAB or SLOB) makes
sense. The patch that disables CONFIG_SLUB_CPU_PARTIAL with PREEMPT_RT could
perhaps be re-evaluated as the series also addresses some latency issues with
percpu partial slabs.

With that series the PARTIAL slab can be indeed enabled. I have (had) a
half done series where I had PARTIAL enabled and noticed a slight
increase in latency so made it "default y on !RT". It wasn't dramatic
but appeared to be outside of noise.

Sebastian

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help