Re: Prezeroing V2 [0/3]: Why and When it works

(off-list ancestor, not in this archive)
Increase page fault rate by prezeroing V1 [0/3]: Overview · Christoph Lameter <hidden> · 2004-12-21
Increase page fault rate by prezeroing V1 [2/3]: zeroing and scrubd · Christoph Lameter <hidden> · 2004-12-21
Increase page fault rate by prezeroing V1 [1/3]: Introduce __GFP_ZERO · Christoph Lameter <hidden> · 2004-12-21
Increase page fault rate by prezeroing V1 [2/3]: zeroing and scrubd · Christoph Lameter <hidden> · 2004-12-21
Increase page fault rate by prezeroing V1 [3/3]: Altix SN2 BTE Zeroing · Christoph Lameter <hidden> · 2004-12-21
Increase page fault rate by prezeroing V1 [3/3]: Altix SN2 BTE Zeroing · Christoph Lameter <hidden> · 2004-12-21
Increase page fault rate by prezeroing V1 [1/3]: Introduce __GFP_ZERO · Christoph Lameter <hidden> · 2004-12-21
Prezeroing V2 [0/3]: Why and When it works · Christoph Lameter <hidden> · 2004-12-23
Prezeroing V2 [1/4]: __GFP_ZERO / clear_page() removal · Christoph Lameter <hidden> · 2004-12-23
Prezeroing V2 [2/4]: add second parameter to clear_page() for all arches · Christoph Lameter <hidden> · 2004-12-23
Re: Prezeroing V2 [2/4]: add second parameter to clear_page() for all arches · Pavel Machek <hidden> · 2004-12-24
Re: Prezeroing V2 [2/4]: add second parameter to clear_page() for all arches · Christoph Lameter <hidden> · 2004-12-24
Re: Prezeroing V2 [2/4]: add second parameter to clear_page() for all arches · Pavel Machek <hidden> · 2004-12-24
Re: Prezeroing V2 [2/4]: add second parameter to clear_page() for all arches · "David S. Miller" <davem@davemloft.net> · 2004-12-24
Re: Prezeroing V2 [2/4]: add second parameter to clear_page() for all arches · "David S. Miller" <davem@davemloft.net> · 2004-12-24
Re: Prezeroing V2 [2/4]: add second parameter to clear_page() for all arches · "David S. Miller" <davem@davemloft.net> · 2004-12-27
Re: Prezeroing V2 [2/4]: add second parameter to clear_page() for all arches · Christoph Lameter <hidden> · 2005-01-03
Re: Prezeroing V2 [2/4]: add second parameter to clear_page() for all arches · Geert Uytterhoeven <geert@linux-m68k.org> · 2005-01-01
Prezeroing V3 [0/4]: Discussion and i386 performance tests · Christoph Lameter <hidden> · 2005-01-04
Prezeroing V3 [3/4]: Page zeroing through kscrubd · Christoph Lameter <hidden> · 2005-01-04
Prezeroing V3 [1/4]: Allow request for zeroed memory · Christoph Lameter <hidden> · 2005-01-04
Re: Prezeroing V3 [1/4]: Allow request for zeroed memory · Linus Torvalds <torvalds@osdl.org> · 2005-01-05
Re: Prezeroing V3 [1/4]: Allow request for zeroed memory · Andrew Morton <hidden> · 2005-01-05
Re: Prezeroing V3 [1/4]: Allow request for zeroed memory · Christoph Lameter <hidden> · 2005-01-05
Re: Prezeroing V3 [1/4]: Allow request for zeroed memory · Dave Hansen <hidden> · 2005-01-05
Re: Prezeroing V3 [1/4]: Allow request for zeroed memory · Christoph Lameter <hidden> · 2005-01-05
Re: Prezeroing V3 [1/4]: Allow request for zeroed memory · Linus Torvalds <torvalds@osdl.org> · 2005-01-05
Re: Prezeroing V3 [1/4]: Allow request for zeroed memory · Christoph Lameter <hidden> · 2005-01-05
Re: Prezeroing V3 [1/4]: Allow request for zeroed memory · Hugh Dickins <hidden> · 2005-01-08
Re: Prezeroing V3 [1/4]: Allow request for zeroed memory · "David S. Miller" <davem@davemloft.net> · 2005-01-08
alloc_zeroed_user_highpage to fix the clear_user_highpage issue · Christoph Lameter <hidden> · 2005-01-21
[Patch] Fix oops in alloc_zeroed_user_highpage() when page is NULL · Michael Ellerman <hidden> · 2005-02-09
Extend clear_page by an order parameter · Christoph Lameter <hidden> · 2005-01-21
Re: Extend clear_page by an order parameter · Paul Mackerras <hidden> · 2005-01-21
Re: Extend clear_page by an order parameter · Christoph Lameter <hidden> · 2005-01-21
Re: Extend clear_page by an order parameter · Paul Mackerras <hidden> · 2005-01-22
Re: Extend clear_page by an order parameter · Andrew Morton <hidden> · 2005-01-22
Re: Extend clear_page by an order parameter · Paul Mackerras <hidden> · 2005-01-22
Re: Extend clear_page by an order parameter · Roman Zippel <hidden> · 2005-01-22
Re: Extend clear_page by an order parameter · Paul Mackerras <hidden> · 2005-01-22
Re: Extend clear_page by an order parameter · Christoph Lameter <hidden> · 2005-01-22
Re: Extend clear_page by an order parameter · Paul Mackerras <hidden> · 2005-01-22
Re: Extend clear_page by an order parameter · Andrew Morton <hidden> · 2005-01-23
Re: Extend clear_page by an order parameter · Christoph Lameter <hidden> · 2005-01-24
Re: Extend clear_page by an order parameter · "David S. Miller" <davem@davemloft.net> · 2005-01-24
Re: Extend clear_page by an order parameter · Christoph Lameter <hidden> · 2005-01-24
A scrub daemon (prezeroing) · Christoph Lameter <hidden> · 2005-01-21
Re: Prezeroing V3 [1/4]: Allow request for zeroed memory · Christoph Lameter <hidden> · 2005-01-10
Re: Prezeroing V3 [1/4]: Allow request for zeroed memory · Linus Torvalds <torvalds@osdl.org> · 2005-01-10
Re: Prezeroing V3 [1/4]: Allow request for zeroed memory · Christoph Lameter <hidden> · 2005-01-10
Prezeroing V4 [0/4]: Overview · Christoph Lameter <hidden> · 2005-01-11
Prezeroing V4 [1/4]: Arch specific page zeroing during page fault · Christoph Lameter <hidden> · 2005-01-10
Prezeroing V4 [3/4]: Altix SN2 BTE zero driver · Christoph Lameter <hidden> · 2005-01-11
Prezeroing V4 [4/4]: Extend clear_page to take an order parameter · Christoph Lameter <hidden> · 2005-01-11
Prezeroing V4 [2/4]: Zeroing implementation · Christoph Lameter <hidden> · 2005-01-11
Prezeroing V3 [2/4]: Extension of clear_page to take an order parameter · Christoph Lameter <hidden> · 2005-01-04
Re: Prezeroing V3 [2/4]: Extension of clear_page to take an order parameter · Christoph Lameter <hidden> · 2005-01-05
Prezeroing V3 [4/4]: Driver for hardware zeroing on Altix · Christoph Lameter <hidden> · 2005-01-04
Prezeroing V2 [3/4]: Add support for ZEROED and NOT_ZEROED free maps · Christoph Lameter <hidden> · 2004-12-23
Prezeroing V2 [4/4]: Hardware Zeroing through SGI BTE · Christoph Lameter <hidden> · 2004-12-23
Re: Prezeroing V2 [1/4]: __GFP_ZERO / clear_page() removal · Brian Gerst <hidden> · 2004-12-23
Re: Prezeroing V2 [1/4]: __GFP_ZERO / clear_page() removal · Christoph Lameter <hidden> · 2004-12-24
Re: Prezeroing V2 [0/3]: Why and When it works · Arjan van de Ven <hidden> · 2004-12-23
Re: Prezeroing V2 [0/3]: Why and When it works · Matt Mackall <hidden> · 2004-12-23
Re: Prezeroing V2 [0/3]: Why and When it works · Paul Mackerras <hidden> · 2004-12-23
Re: Prezeroing V2 [0/3]: Why and When it works · Paul Mackerras <hidden> · 2004-12-23
Re: Prezeroing V2 [0/3]: Why and When it works · Andrew Morton <hidden> · 2004-12-23
Re: Prezeroing V2 [0/3]: Why and When it works · Paul Mackerras <hidden> · 2004-12-23
Re: Prezeroing V2 [0/3]: Why and When it works · Linus Torvalds <torvalds@osdl.org> · 2004-12-23
Re: Prezeroing V2 [0/3]: Why and When it works · Zwane Mwaikambo <hidden> · 2004-12-23
Re: Prezeroing V2 [0/3]: Why and When it works · Arjan van de Ven <hidden> · 2004-12-24
Re: Prezeroing V2 [0/3]: Why and When it works · Linus Torvalds <torvalds@osdl.org> · 2004-12-24
Re: Prezeroing V2 [0/3]: Why and When it works · Arjan van de Ven <hidden> · 2004-12-24
Re: Prezeroing V2 [0/3]: Why and When it works · "David S. Miller" <davem@davemloft.net> · 2004-12-27
Re: Prezeroing V2 [0/3]: Why and When it works · Marcelo Tosatti <hidden> · 2004-12-28
Re: Prezeroing V2 [0/3]: Why and When it works · Christoph Lameter <hidden> · 2004-12-24

From: Paul Mackerras <hidden>
Date: 2004-12-23 23:01:21
Also in: lkml

Andrew Morton writes:

When the workload is a gcc run, the pagefault handler dominates the system
time.  That's the page zeroing.

For a program which uses a lot of heap and doesn't fork, that sounds
reasonable.

x86's movnta instructions provide a way of initialising memory without
trashing the caches and it has pretty good bandwidth, I believe.  We should
wire that up to these patches and see if it speeds things up.

Yes.  I don't know the movnta instruction, but surely, whatever scheme
is used, there has to be a snoop for every cache line's worth of
memory that is zeroed.

The other point is that having the page hot in the cache may well be a
benefit to the program.  Using any sort of cache-bypassing zeroing
might not actually make things faster, when the user time as well as
the system time is taken into account.

quoted

I did some measurements once on my G5 powermac (running a ppc64 linux
kernel) of how long clear_page takes, and it only takes 96ns for a 4kB
page.

40GB/s.  Is that straight into L1 or does the measurement include writeback?

It is the average elapsed time in clear_page, so it would include the
writeback of any cache lines displaced by the zeroing, but not the
writeback of the newly-zeroed cache lines (which we hope will be
modified by the program before they get written back anyway).

This is using the dcbz (data cache block zero) instruction, which
establishes a cache line in modified state with zero contents without
any memory traffic other than a cache line kill transaction sent to
the other CPUs and possible writeback of a dirty cache line displaced
by the newly-zeroed cache line.  The new cache line is established in
the L2 cache, because the L1 is write-through on the G5, and all
stores and dcbz instructions have to go to the L2 cache.

Thus, on the G5 (and POWER4, which is similar) I don't think there
will be much if any benefit from having pre-zeroed cache-cold pages.
We can establish the zero lines in cache much faster using dcbz than
we can by reading them in from main memory.  If the program uses only
a few cache lines out of each new page, then reading them from memory
might be faster, but that seems unlikely.

Paul.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help