Re: [MM PATCH V4 6/6] slub: optimize bulk slowpath free by detached freelist
From: Joonsoo Kim <hidden>
Date: 2015-11-05 05:09:18
Also in:
linux-mm
On Wed, Oct 21, 2015 at 09:57:09AM +0200, Jesper Dangaard Brouer wrote:
On Wed, 14 Oct 2015 14:15:25 +0900 Joonsoo Kim [off-list ref] wrote:quoted
On Tue, Sep 29, 2015 at 05:48:26PM +0200, Jesper Dangaard Brouer wrote:quoted
This change focus on improving the speed of object freeing in the "slowpath" of kmem_cache_free_bulk. The calls slab_free (fastpath) and __slab_free (slowpath) have been extended with support for bulk free, which amortize the overhead of the (locked) cmpxchg_double. To use the new bulking feature, we build what I call a detached freelist. The detached freelist takes advantage of three properties: 1) the free function call owns the object that is about to be freed, thus writing into this memory is synchronization-free. 2) many freelist's can co-exist side-by-side in the same slab-page each with a separate head pointer. 3) it is the visibility of the head pointer that needs synchronization. Given these properties, the brilliant part is that the detached freelist can be constructed without any need for synchronization. The freelist is constructed directly in the page objects, without any synchronization needed. The detached freelist is allocated on the stack of the function call kmem_cache_free_bulk. Thus, the freelist head pointer is not visible to other CPUs. All objects in a SLUB freelist must belong to the same slab-page. Thus, constructing the detached freelist is about matching objects that belong to the same slab-page. The bulk free array is scanned is a progressive manor with a limited look-ahead facility.[...]quoted
Hello, Jesper. AFAIK, it is uncommon to clear pointer to object in argument array. At least, it is better to comment it on somewhere.In this case, I think clearing the array is a good thing, as using/referencing objects after they have been free'ed is a bug (which can be hard to detect).
Okay.
quoted
Or, how about removing lookahead facility? Does it have real benefit?In my earlier patch series I had a version with and without lookahead facility. Just so I could benchmark the difference. With Alex'es help we/I tuned the code with the lookahead feature to be just as fast. Thus, I merged the two patches. (Also did testing for worstcase [1]) I do wonder if the lookahead have any real benefit. In micro benchmarking it might be "just-as-fast", but I do suspect (just the code size increase) it can affect real use-cases... Should we remove it? [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_bulk_test03.c
If it's not implemented yet, I would say that starting with simple one first. But, now, we already have well implemented one so we don't need to remove it. :) Thanks.