Re: Prezeroing V2 [0/3]: Why and When it works
From: Paul Mackerras <hidden>
Date: 2004-12-23 23:01:21
Also in:
lkml
Andrew Morton writes:
When the workload is a gcc run, the pagefault handler dominates the system time. That's the page zeroing.
For a program which uses a lot of heap and doesn't fork, that sounds reasonable.
x86's movnta instructions provide a way of initialising memory without trashing the caches and it has pretty good bandwidth, I believe. We should wire that up to these patches and see if it speeds things up.
Yes. I don't know the movnta instruction, but surely, whatever scheme is used, there has to be a snoop for every cache line's worth of memory that is zeroed. The other point is that having the page hot in the cache may well be a benefit to the program. Using any sort of cache-bypassing zeroing might not actually make things faster, when the user time as well as the system time is taken into account.
quoted
I did some measurements once on my G5 powermac (running a ppc64 linux kernel) of how long clear_page takes, and it only takes 96ns for a 4kB page.40GB/s. Is that straight into L1 or does the measurement include writeback?
It is the average elapsed time in clear_page, so it would include the writeback of any cache lines displaced by the zeroing, but not the writeback of the newly-zeroed cache lines (which we hope will be modified by the program before they get written back anyway). This is using the dcbz (data cache block zero) instruction, which establishes a cache line in modified state with zero contents without any memory traffic other than a cache line kill transaction sent to the other CPUs and possible writeback of a dirty cache line displaced by the newly-zeroed cache line. The new cache line is established in the L2 cache, because the L1 is write-through on the G5, and all stores and dcbz instructions have to go to the L2 cache. Thus, on the G5 (and POWER4, which is similar) I don't think there will be much if any benefit from having pre-zeroed cache-cold pages. We can establish the zero lines in cache much faster using dcbz than we can by reading them in from main memory. If the program uses only a few cache lines out of each new page, then reading them from memory might be faster, but that seems unlikely. Paul. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"aart@kvack.org"> aart@kvack.org </a>