Re: [v3 0/9] parallelized "struct page" zeroing
From: Benjamin Herrenschmidt <hidden>
Date: 2017-05-16 23:52:00
Also in:
linux-mm, linux-s390, lkml, sparclinux
From: Benjamin Herrenschmidt <hidden>
Date: 2017-05-16 23:52:00
Also in:
linux-mm, linux-s390, lkml, sparclinux
On Fri, 2017-05-12 at 13:37 -0400, David Miller wrote:
quoted
Right now it is larger, but what I suggested is to add a new optimized routine just for this case, which would do STBI for 64-bytes but without membar (do membar at the end of memmap_init_zone() and deferred_init_memmap() #define struct_page_clear(page) \ __asm__ __volatile__( \ "stxa %%g0, [%0]%2\n" \ "stxa %%xg0, [%0 + %1]%2\n" \ : /* No output */ \ : "r" (page), "r" (0x20), "i"(ASI_BLK_INIT_QUAD_LDD_P)) And insert it into __init_single_page() instead of memset() The final result is 4.01s/T which is even faster compared to current 4.97s/TOk, indeed, that would work.
On ppc64, that might not. We have a dcbz instruction that clears an entire cache line at once. That's what we use for memset's and page clearing. However, 64 bytes is half a cache line on modern processors so we can't use it with that semantic and would have to fallback to the slower stores. Cheers, Ben.