Re: [PATCH v2 4/6] x86: Add clear_page_nocache

[PATCH v2 0/6] Avoid cache trashing on clearing huge/gigantic page · Kirill A. Shutemov <hidden> · 2012-08-09
[PATCH v2 5/6] mm: make clear_huge_page cache clear only around the fault address · Kirill A. Shutemov <hidden> · 2012-08-09
[PATCH v2 4/6] x86: Add clear_page_nocache · Kirill A. Shutemov <hidden> · 2012-08-09
Re: [PATCH v2 4/6] x86: Add clear_page_nocache · Jan Beulich <hidden> · 2012-08-09
Re: [PATCH v2 4/6] x86: Add clear_page_nocache · Kirill A. Shutemov <hidden> · 2012-08-13
Re: [PATCH v2 4/6] x86: Add clear_page_nocache · Jan Beulich <hidden> · 2012-08-13
Re: [PATCH v2 4/6] x86: Add clear_page_nocache · Andi Kleen <hidden> · 2012-08-13
Re: [PATCH v2 4/6] x86: Add clear_page_nocache · Borislav Petkov <bp@alien8.de> · 2012-08-13
Re: [PATCH v2 4/6] x86: Add clear_page_nocache · Kirill A. Shutemov <hidden> · 2012-08-13
Re: [PATCH v2 4/6] x86: Add clear_page_nocache · "H. Peter Anvin" <hpa@zytor.com> · 2012-08-09
[PATCH v2 2/6] mm: make clear_huge_page tolerate non aligned address · Kirill A. Shutemov <hidden> · 2012-08-09
[PATCH v2 3/6] THP: Pass real, not rounded, address to clear_huge_page · Kirill A. Shutemov <hidden> · 2012-08-09
[PATCH v2 1/6] THP: Use real address for NUMA policy · Kirill A. Shutemov <hidden> · 2012-08-09
[PATCH v2 6/6] x86: switch the 64bit uncached page clear to SSE/AVX v2 · Kirill A. Shutemov <hidden> · 2012-08-09
Re: [PATCH v2 6/6] x86: switch the 64bit uncached page clear to SSE/AVX v2 · Jan Beulich <hidden> · 2012-08-09

From: Jan Beulich <hidden>
Date: 2012-08-13 12:03:10
Also in: linux-mm, linux-sh, linuxppc-dev, lkml, sparclinux

quoted

On 13.08.12 at 13:43, "Kirill A. Shutemov" [off-list ref] wrote:

On Thu, Aug 09, 2012 at 04:22:04PM +0100, Jan Beulich wrote:

quoted

On 09.08.12 at 17:03, "Kirill A. Shutemov" [off-list ref]  wrote:

...

quoted

---
 arch/x86/include/asm/page.h          |    2 ++
 arch/x86/include/asm/string_32.h     |    5 +++++
 arch/x86/include/asm/string_64.h     |    5 +++++
 arch/x86/lib/Makefile                |    1 +
 arch/x86/lib/clear_page_nocache_32.S |   30 ++++++++++++++++++++++++++++++
 arch/x86/lib/clear_page_nocache_64.S |   29 +++++++++++++++++++++++++++++

Couldn't this more reasonably go into clear_page_{32,64}.S?

We don't have clear_page_32.S.

Sure, but you're introducing a file anyway. Fold the new code into
the existing file for 64-bit, and create a new, similarly named one
for 32-bit.

quoted

+	xorl   %eax,%eax
+	movl   $4096/64,%ecx
+	.p2align 4
+.Lloop:
+	decl	%ecx
+#define PUT(x) movnti %eax,x*8(%edi) ; movnti %eax,x*8+4(%edi)

Is doing twice as much unrolling as on 64-bit really worth it?

Moving 64 bytes per cycle is faster on Sandy Bridge, but slower on
Westmere. Any preference? ;)

If it's not a clear win, I'd favor the 8-stores-per-cycle variant,
matching x86-64.

Jan

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help