Do we really need SLOB nowdays?

From: Hyeonggon Yoo <hidden>
Date: 2021-10-17 13:57:18
Also in: lkml

On Sun, Oct 17, 2021 at 01:36:18PM +0000, Hyeonggon Yoo wrote:

On Sun, Oct 17, 2021 at 04:28:52AM +0000, Hyeonggon Yoo wrote:

quoted

I've been reading SLUB/SLOB code for a while. SLUB recently became
real time compatible by reducing its locking area.

for now, SLUB is the only slab allocator for PREEMPT_RT because
it works better than SLAB on RT and SLOB uses non-deterministic method,
sequential fit.

But memory usage of SLUB is too high for systems with low memory.
So In my local repository I made SLOB to use segregated free list
method, which is more more deterministic, to provide bounded latency.

This can be done by managing list of partial pages globally
for every power of two sizes (8, 16, 32, ..., PAGE_SIZE) per NUMA nodes.
minimal allocation size is size of pointers to keep pointer of next free object
like SLUB.

By making objects in same page to have same size, there's no
need to iterate free blocks in a page. (Also iterating pages isn't needed)

Some cleanups and more tests (especially with NUMA/RT configs) needed,
but want to hear your opinion about the idea. Did not test on RT yet.

Below is result of benchmarks and memory usage. (on !RT)
with 13% increase in memory usage, it's nine times faster and
bounded fragmentation, and importantly provides predictable execution time.

Hello linux-mm, I improved it and it uses lower memory
and 9x~13x faster than original SLOB. it shows much less fragmentation
after hackbench.

Rather than managing global freelist that has power of 2 sizes,
I made a kmem_cache to manage its own freelist (for each NUMA nodes) and
Added support for slab merging. So It quite looks like a lightweight SLUB now.

I'll send rfc patch after some testing and code cleaning.

I think it is more RT-friendly becuase it's uses more deterministic
algorithm (But lock is still shared among cpus). Any opinions for RT?

Hi there. after some thinking, I got a new question:
If a lightweight SLUB is better than SLOB,
Do we really need SLOB nowdays?

And one more question:
    in Christoph's presentation [1], it says SLOB uses
    300 KB of memory. but on my system it uses almost 8000 KB.
    what's is differences?

[1] https://events.static.linuxfound.org/sites/events/files/slides/slaballocators.pdf

SLUB without cpu partials:

memory usage:
   after boot:
       Slab:               8672 kB
   after hackbench:
       Slab:               9540 kB

Performance counter stats for 'hackbench -g 4 -l 10000':
          48463.05 msec cpu-clock                 #    1.995 CPUs utilized
            944154      context-switches          #   19.482 K/sec
              8161      cpu-migrations            #  168.396 /sec
              4117      page-faults               #   84.951 /sec
       52570808507      cycles                    #    1.085 GHz
       65083778667      instructions              #    1.24  insn per cycle
         234990576      branch-misses
       23628671709      cache-references          #  487.561 M/sec
         739599271      cache-misses              #    3.130 % of all cache refs

      24.287392120 seconds time elapsed

       1.509198000 seconds user
      46.942748000 seconds sys

current SLOB:
    memory usage:
        after boot:
            Slab:               7908 kB
        after hackbench:
            Slab:               8544 kB
  
    Time: 189.947
    Performance counter stats for 'hackbench -g 4 -l 10000':
         379413.20 msec cpu-clock                 #    1.997 CPUs utilized          
           8818226      context-switches          #   23.242 K/sec                  
            375186      cpu-migrations            #  988.859 /sec                   
              3954      page-faults               #   10.421 /sec                   
      269923095290      cycles                    #    0.711 GHz                    
      212341582012      instructions              #    0.79  insn per cycle         
        2361087153      branch-misses                                               
       58222839688      cache-references          #  153.455 M/sec                  
        6786521959      cache-misses              #   11.656 % of all cache refs    

     190.002062273 seconds time elapsed

       3.486150000 seconds user
     375.599495000 seconds sys

SLOB with segregated list + slab merging:
    memory usage:
       after boot:
           Slab:               7560 kB
        after hackbench:
           Slab:               7836 kB        

hackbench:
    Time: 20.780
    Performance counter stats for 'hackbench -g 4 -l 10000':
          41509.79 msec cpu-clock                 #    1.996 CPUs utilized          
            630032      context-switches          #   15.178 K/sec                  
              8287      cpu-migrations            #  199.640 /sec                   
              4036      page-faults               #   97.230 /sec                   
       57477161020      cycles                    #    1.385 GHz                    
       62775453932      instructions              #    1.09  insn per cycle         
         164902523      branch-misses                                               
       22559952993      cache-references          #  543.485 M/sec                  
         832404011      cache-misses              #    3.690 % of all cache refs    

      20.791893590 seconds time elapsed

       1.423282000 seconds user
      40.072449000 seconds sys
-
Thanks,
Hyeonggon

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help