Re: [PATCH v2 4/6] x86: Add clear_page_nocache
From: Jan Beulich <hidden>
Date: 2012-08-09 15:22:14
Also in:
linux-mips, linux-mm, linuxppc-dev, lkml, sparclinux
quoted
quoted
On 09.08.12 at 17:03, "Kirill A. Shutemov" [off-list ref] wrote:From: Andi Kleen <redacted> Add a cache avoiding version of clear_page. Straight forward integer variant of the existing 64bit clear_page, for both 32bit and 64bit.
While on 64-bit this is fine, I fail to see how you avoid using the SSE2 instruction on non-SSE2 systems.
Also add the necessary glue for highmem including a layer that non cache coherent architectures that use the virtual address for flushing can hook in. This is not needed on x86 of course. If an architecture wants to provide cache avoiding version of clear_page it should to define ARCH_HAS_USER_NOCACHE to 1 and implement clear_page_nocache() and clear_user_highpage_nocache(). Signed-off-by: Andi Kleen <redacted> Signed-off-by: Kirill A. Shutemov <redacted> --- arch/x86/include/asm/page.h | 2 ++ arch/x86/include/asm/string_32.h | 5 +++++ arch/x86/include/asm/string_64.h | 5 +++++ arch/x86/lib/Makefile | 1 + arch/x86/lib/clear_page_nocache_32.S | 30 ++++++++++++++++++++++++++++++ arch/x86/lib/clear_page_nocache_64.S | 29 +++++++++++++++++++++++++++++
Couldn't this more reasonably go into clear_page_{32,64}.S?
quoted hunk ↗ jump to hunk
arch/x86/mm/fault.c | 7 +++++++ 7 files changed, 79 insertions(+), 0 deletions(-) create mode 100644 arch/x86/lib/clear_page_nocache_32.S create mode 100644 arch/x86/lib/clear_page_nocache_64.S ...--- /dev/null +++ b/arch/x86/lib/clear_page_nocache_32.S@@ -0,0 +1,30 @@ +#include <linux/linkage.h> +#include <asm/dwarf2.h> + +/* + * Zero a page avoiding the caches + * rdi page
Wrong comment.
+ */ +ENTRY(clear_page_nocache) + CFI_STARTPROC + mov %eax,%edi
You need to pick a different register here (e.g. %edx), since %edi has to be preserved by all functions called from C.
+ xorl %eax,%eax + movl $4096/64,%ecx + .p2align 4 +.Lloop: + decl %ecx +#define PUT(x) movnti %eax,x*8(%edi) ; movnti %eax,x*8+4(%edi)
Is doing twice as much unrolling as on 64-bit really worth it? Jan