Thread (11 messages) 11 messages, 3 authors, 2019-05-20

Re: [PATCH v2 4/4] net: apply __GFP_NO_AUTOINIT to AF_UNIX sk_buff allocations

From: Alexander Potapenko <glider@google.com>
Date: 2019-05-17 13:51:00
Also in: linux-mm

On Fri, May 17, 2019 at 10:49 AM Alexander Potapenko [off-list ref] wrote:
On Fri, May 17, 2019 at 2:26 AM Kees Cook [off-list ref] wrote:
quoted
On Thu, May 16, 2019 at 09:53:01AM -0700, Kees Cook wrote:
quoted
On Tue, May 14, 2019 at 04:35:37PM +0200, Alexander Potapenko wrote:
quoted
Add sock_alloc_send_pskb_noinit(), which is similar to
sock_alloc_send_pskb(), but allocates with __GFP_NO_AUTOINIT.
This helps reduce the slowdown on hackbench in the init_on_alloc mode
from 6.84% to 3.45%.
Out of curiosity, why the creation of the new function over adding a
gfp flag argument to sock_alloc_send_pskb() and updating callers? (There
are only 6 callers, and this change already updates 2 of those.)
quoted
Slowdown for the initialization features compared to init_on_free=0,
init_on_alloc=0:

hackbench, init_on_free=1:  +7.71% sys time (st.err 0.45%)
hackbench, init_on_alloc=1: +3.45% sys time (st.err 0.86%)
So I've run some of my own wall-clock timings of kernel builds (which
should be an pretty big "worst case" situation, and I see much smaller
performance changes:
How many cores were you using? I suspect the numbers may vary a bit
depending on that.
quoted
everything off
        Run times: 289.18 288.61 289.66 287.71 287.67
        Min: 287.67 Max: 289.66 Mean: 288.57 Std Dev: 0.79
                baseline

init_on_alloc=1
        Run times: 289.72 286.95 287.87 287.34 287.35
        Min: 286.95 Max: 289.72 Mean: 287.85 Std Dev: 0.98
                0.25% faster (within the std dev noise)

init_on_free=1
        Run times: 303.26 301.44 301.19 301.55 301.39
        Min: 301.19 Max: 303.26 Mean: 301.77 Std Dev: 0.75
                4.57% slower

init_on_free=1 with the PAX_MEMORY_SANITIZE slabs excluded:
        Run times: 299.19 299.85 298.95 298.23 298.64
        Min: 298.23 Max: 299.85 Mean: 298.97 Std Dev: 0.55
                3.60% slower

So the tuning certainly improved things by 1%. My perf numbers don't
show the 24% hit you were seeing at all, though.
Note that 24% is the _sys_ time slowdown. The wall time slowdown seen
in this case was 8.34%
I've collected more stats running QEMU with different numbers of cores.
The slowdown values of init_on_free compared to baseline are:
2 CPUs - 5.94% for wall time (20.08% for sys time)
6 CPUs - 7.43% for wall time (23.55% for sys time)
12 CPUs - 8.41% for wall time (24.25% for sys time)
24 CPUs - 9.49% for wall time (17.98% for sys time)

I'm building a defconfig of some fixed KMSAN tree with Clang, but that
shouldn't matter much.
quoted
quoted
In the commit log it might be worth mentioning that this is only
changing the init_on_alloc case (in case it's not already obvious to
folks). Perhaps there needs to be a split of __GFP_NO_AUTOINIT into
__GFP_NO_AUTO_ALLOC_INIT and __GFP_NO_AUTO_FREE_INIT? Right now
__GFP_NO_AUTOINIT is only checked for init_on_alloc:
I was obviously crazy here. :) GFP isn't present for free(), but a SLAB
flag works (as was done in PAX_MEMORY_SANITIZE). I'll send the patch I
used for the above timing test.

--
Kees Cook


--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg


-- 
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help