Thread (46 messages) 46 messages, 7 authors, 2017-06-01

Re: [v3 0/9] parallelized "struct page" zeroing

From: Benjamin Herrenschmidt <hidden>
Date: 2017-05-16 23:52:00
Also in: linux-mm, linux-s390, lkml, sparclinux

On Fri, 2017-05-12 at 13:37 -0400, David Miller wrote:
quoted
Right now it is larger, but what I suggested is to add a new optimized
routine just for this case, which would do STBI for 64-bytes but
without membar (do membar at the end of memmap_init_zone() and
deferred_init_memmap()

#define struct_page_clear(page)                                 \
         __asm__ __volatile__(                                   \
         "stxa   %%g0, [%0]%2\n"                                 \
         "stxa   %%xg0, [%0 + %1]%2\n"                           \
         : /* No output */                                       \
         : "r" (page), "r" (0x20), "i"(ASI_BLK_INIT_QUAD_LDD_P))

And insert it into __init_single_page() instead of memset()

The final result is 4.01s/T which is even faster compared to current
4.97s/T
Ok, indeed, that would work.
On ppc64, that might not. We have a dcbz instruction that clears an
entire cache line at once. That's what we use for memset's and page
clearing. However, 64 bytes is half a cache line on modern processors
so we can't use it with that semantic and would have to fallback to the
slower stores.

Cheers,
Ben.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help