Thread (46 messages) 46 messages, 7 authors, 2017-06-01

Re: [v3 0/9] parallelized "struct page" zeroing

From: Matthew Wilcox <willy@infradead.org>
Date: 2017-05-10 17:17:09
Also in: linux-mm, linux-s390, lkml, sparclinux

On Wed, May 10, 2017 at 11:19:43AM -0400, David Miller wrote:
From: Michal Hocko <mhocko@kernel.org>
Date: Wed, 10 May 2017 16:57:26 +0200
quoted
Have you measured that? I do not think it would be super hard to
measure. I would be quite surprised if this added much if anything at
all as the whole struct page should be in the cache line already. We do
set reference count and other struct members. Almost nobody should be
looking at our page at this time and stealing the cache line. On the
other hand a large memcpy will basically wipe everything away from the
cpu cache. Or am I missing something?
I guess it might be clearer if you understand what the block
initializing stores do on sparc64.  There are no memory accesses at
all.

The cpu just zeros out the cache line, that's it.

No L3 cache line is allocated.  So this "wipe everything" behavior
will not happen in the L3.
There's either something wrong with your explanation or my reading
skills :-)

"There are no memory accesses"
"No L3 cache line is allocated"

You can have one or the other ... either the CPU sends a cacheline-sized
write of zeroes to memory without allocating an L3 cache line (maybe
using the store buffer?), or the CPU allocates an L3 cache line and sets
its contents to zeroes, probably putting it in the last way of the set
so it's the first thing to be evicted if not touched.

Or there's some magic in the memory bus protocol where the CPU gets to
tell the DRAM "hey, clear these cache lines".  Although that's also a
memory access of sorts ...
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help