Re: [MM PATCH V4.1 5/6] slub: support for bulk free with SLUB freelists

[PATCH 0/7] Further optimizing SLAB/SLUB bulking · Jesper Dangaard Brouer <hidden> · 2015-09-28
[PATCH 1/7] slub: create new ___slab_alloc function that can be called with irqs disabled · Jesper Dangaard Brouer <hidden> · 2015-09-28
[PATCH 2/7] slub: Avoid irqoff/on in bulk allocation · Jesper Dangaard Brouer <hidden> · 2015-09-28
[PATCH 3/7] slub: mark the dangling ifdef #else of CONFIG_SLUB_DEBUG · Jesper Dangaard Brouer <hidden> · 2015-09-28
Re: [PATCH 3/7] slub: mark the dangling ifdef #else of CONFIG_SLUB_DEBUG · Christoph Lameter <hidden> · 2015-09-28
[PATCH 4/7] slab: implement bulking for SLAB allocator · Jesper Dangaard Brouer <hidden> · 2015-09-28
Re: [PATCH 4/7] slab: implement bulking for SLAB allocator · Christoph Lameter <hidden> · 2015-09-28
[PATCH 5/7] slub: support for bulk free with SLUB freelists · Jesper Dangaard Brouer <hidden> · 2015-09-28
Re: [PATCH 5/7] slub: support for bulk free with SLUB freelists · Christoph Lameter <hidden> · 2015-09-28
Re: [PATCH 5/7] slub: support for bulk free with SLUB freelists · Jesper Dangaard Brouer <hidden> · 2015-09-28
Re: [PATCH 5/7] slub: support for bulk free with SLUB freelists · Christoph Lameter <hidden> · 2015-09-28
Re: [PATCH 5/7] slub: support for bulk free with SLUB freelists · Jesper Dangaard Brouer <hidden> · 2015-09-29
Re: [PATCH 5/7] slub: support for bulk free with SLUB freelists · Christoph Lameter <hidden> · 2015-09-28
Re: [PATCH 5/7] slub: support for bulk free with SLUB freelists · Jesper Dangaard Brouer <hidden> · 2015-09-29
[PATCH 6/7] slub: optimize bulk slowpath free by detached freelist · Jesper Dangaard Brouer <hidden> · 2015-09-28
Re: [PATCH 6/7] slub: optimize bulk slowpath free by detached freelist · Christoph Lameter <hidden> · 2015-09-28
[PATCH 7/7] slub: do prefetching in kmem_cache_alloc_bulk() · Jesper Dangaard Brouer <hidden> · 2015-09-28
Re: [PATCH 7/7] slub: do prefetching in kmem_cache_alloc_bulk() · Alexander Duyck <hidden> · 2015-09-28
Re: [PATCH 7/7] slub: do prefetching in kmem_cache_alloc_bulk() · Jesper Dangaard Brouer <hidden> · 2015-09-28
[MM PATCH V4 0/6] Further optimizing SLAB/SLUB bulking · Jesper Dangaard Brouer <hidden> · 2015-09-29
[MM PATCH V4 1/6] slub: create new ___slab_alloc function that can be called with irqs disabled · Jesper Dangaard Brouer <hidden> · 2015-09-29
[MM PATCH V4 2/6] slub: Avoid irqoff/on in bulk allocation · Jesper Dangaard Brouer <hidden> · 2015-09-29
[MM PATCH V4 3/6] slub: mark the dangling ifdef #else of CONFIG_SLUB_DEBUG · Jesper Dangaard Brouer <hidden> · 2015-09-29
[MM PATCH V4 4/6] slab: implement bulking for SLAB allocator · Jesper Dangaard Brouer <hidden> · 2015-09-29
[MM PATCH V4 5/6] slub: support for bulk free with SLUB freelists · Jesper Dangaard Brouer <hidden> · 2015-09-29
Re: [MM PATCH V4 5/6] slub: support for bulk free with SLUB freelists · Alexander Duyck <hidden> · 2015-09-29
Re: [MM PATCH V4 5/6] slub: support for bulk free with SLUB freelists · Jesper Dangaard Brouer <hidden> · 2015-09-29
Re: [MM PATCH V4 5/6] slub: support for bulk free with SLUB freelists · Alexander Duyck <hidden> · 2015-09-29
Re: [MM PATCH V4 5/6] slub: support for bulk free with SLUB freelists · Jesper Dangaard Brouer <hidden> · 2015-09-29
[MM PATCH V4.1 5/6] slub: support for bulk free with SLUB freelists · Jesper Dangaard Brouer <hidden> · 2015-09-30
Re: [MM PATCH V4.1 5/6] slub: support for bulk free with SLUB freelists · Christoph Lameter <hidden> · 2015-09-30
Re: [MM PATCH V4.1 5/6] slub: support for bulk free with SLUB freelists · Andrew Morton <akpm@linux-foundation.org> · 2015-10-01
Re: [MM PATCH V4.1 5/6] slub: support for bulk free with SLUB freelists · Jesper Dangaard Brouer <hidden> · 2015-10-02
Re: [MM PATCH V4.1 5/6] slub: support for bulk free with SLUB freelists · Christoph Lameter <hidden> · 2015-10-02
Re: [MM PATCH V4.1 5/6] slub: support for bulk free with SLUB freelists · Jesper Dangaard Brouer <hidden> · 2015-10-02
Re: [MM PATCH V4.1 5/6] slub: support for bulk free with SLUB freelists · Jesper Dangaard Brouer <hidden> · 2015-10-02
Re: [MM PATCH V4.1 5/6] slub: support for bulk free with SLUB freelists · Andrew Morton <akpm@linux-foundation.org> · 2015-10-02
Re: [MM PATCH V4.1 5/6] slub: support for bulk free with SLUB freelists · Jesper Dangaard Brouer <hidden> · 2015-10-05
Re: [MM PATCH V4.1 5/6] slub: support for bulk free with SLUB freelists · Andi Kleen <hidden> · 2015-10-05
Re: [MM PATCH V4.1 5/6] slub: support for bulk free with SLUB freelists · Jesper Dangaard Brouer <hidden> · 2015-10-05
Re: [MM PATCH V4.1 5/6] slub: support for bulk free with SLUB freelists · Jesper Dangaard Brouer <hidden> · 2015-10-07
Re: [MM PATCH V4.1 5/6] slub: support for bulk free with SLUB freelists · Arnaldo Carvalho de Melo <hidden> · 2015-10-07
Re: [MM PATCH V4.1 5/6] slub: support for bulk free with SLUB freelists · Andi Kleen <hidden> · 2015-10-07
Re: [MM PATCH V4.1 5/6] slub: support for bulk free with SLUB freelists · Andi Kleen <hidden> · 2015-10-07
Re: [MM PATCH V4.1 5/6] slub: support for bulk free with SLUB freelists · Jesper Dangaard Brouer <hidden> · 2015-10-05
Re: [MM PATCH V4.1 5/6] slub: support for bulk free with SLUB freelists · Jesper Dangaard Brouer <hidden> · 2015-10-07
[MM PATCH V4 6/6] slub: optimize bulk slowpath free by detached freelist · Jesper Dangaard Brouer <hidden> · 2015-09-29
Re: [MM PATCH V4 6/6] slub: optimize bulk slowpath free by detached freelist · Joonsoo Kim <hidden> · 2015-10-14
Re: [MM PATCH V4 6/6] slub: optimize bulk slowpath free by detached freelist · Jesper Dangaard Brouer <hidden> · 2015-10-21
Re: [MM PATCH V4 6/6] slub: optimize bulk slowpath free by detached freelist · Joonsoo Kim <hidden> · 2015-11-05

From: Andrew Morton <akpm@linux-foundation.org>
Date: 2015-10-02 21:50:47
Also in: linux-mm

On Fri, 2 Oct 2015 15:40:39 +0200 Jesper Dangaard Brouer [off-list ref] wrote:

quoted

Thus, I need introducing new code like this patch and at the same time
have to reduce the number of instruction-cache misses/usage.  In this
case we solve the problem by kmem_cache_free_bulk() not getting called
too often. Thus, +17 bytes will hopefully not matter too much... but on
the other hand we sort-of know that calling kmem_cache_free_bulk() will
cause icache misses.

I just tested this change on top of my net-use-case patchset... and for
some strange reason the code with this WARN_ON is faster and have much
less icache-misses (1,278,276 vs 2,719,158 L1-icache-load-misses).

Thus, I think we should keep your fix.

I cannot explain why using WARN_ON() is better and cause less icache
misses.  And I hate when I don't understand every detail.

 My theory is, after reading the assembler code, that the UD2
instruction (from BUG_ON) cause some kind of icache decoder stall
(Intel experts???).  Now that should not be a problem, as UD2 is
obviously placed as an unlikely branch and left at the end of the asm
function call.  But the call to __slab_free() is also placed at the end
of the asm function (gets inlined from slab_free() as unlikely).  And
it is actually fairly likely that bulking is calling __slab_free (slub
slowpath call).

Yes, I was looking at the asm code and the difference is pretty small:
a not-taken ud2 versus a not-taken "call warn_slowpath_null", mainly.

But I wouldn't assume that the microbenchmarking is meaningful.  I've
seen shockingly large (and quite repeatable) microbenchmarking
differences from small changes in code which isn't even executed (and
this is one such case, actually).  You add or remove just one byte of
text and half the kernel (or half the .o file?) gets a different
alignment and this seems to change everything.

Deleting the BUG altogether sounds the best solution.  As long as the
kernel crashes in some manner, we'll be able to work out what happened.
And it's cant-happen anyway, isn't it?

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help