Thread (20 messages) 20 messages, 8 authors, 2017-08-18

Re: Optimised memset64/memset32 for powerpc

From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Date: 2017-03-20 21:24:21

On Mon, 2017-03-20 at 14:14 -0700, Matthew Wilcox wrote:
I recently introduced memset32() / memset64().  I've done implementations
for x86 & ARM; akpm has agreed to take the patchset through his tree.
Do you fancy doing a powerpc version?  Minchan Kim got a 7% performance
increase with zram from switching to the optimised version on x86.
+Anton

Thanks Matthew !
quoted hunk ↗ jump to hunk
Here's the development git tree:
http://git.infradead.org/users/willy/linux-dax.git/shortlog/refs/heads/memfill
(most recent 7 commits)

ARM probably offers the best model for you to work from; it's basically
just a case of jumping into the middle of your existing memset loop.
It was only three instructions to add support to ARM, but I don't know
PowerPC well enough to understand how your existing memset works.
I'd start with something like this ... note that you don't have to
implement memset64 on 32-bit; I only did it on ARM because it was free.
It doesn't look free for you as you only store one register each time
around the loop in the 32-bit memset implementation:

1:      stwu    r4,4(r6)
        bdnz    1b

(wouldn't you get better performance on 32-bit powerpc by unrolling that
loop like you do on 64-bit?)
diff --git a/arch/powerpc/include/asm/string.h b/arch/powerpc/include/asm/string.h
index da3cdffca440..c02392fced98 100644
--- a/arch/powerpc/include/asm/string.h
+++ b/arch/powerpc/include/asm/string.h
@@ -6,6 +6,7 @@
 #define __HAVE_ARCH_STRNCPY
 #define __HAVE_ARCH_STRNCMP
 #define __HAVE_ARCH_MEMSET
+#define __HAVE_ARCH_MEMSET_PLUS
 #define __HAVE_ARCH_MEMCPY
 #define __HAVE_ARCH_MEMMOVE
 #define __HAVE_ARCH_MEMCMP
@@ -23,6 +24,18 @@ extern void * memmove(void *,const void *,__kernel_size_t);
 extern int memcmp(const void *,const void *,__kernel_size_t);
 extern void * memchr(const void *,int,__kernel_size_t);
 
+extern void *__memset32(uint32_t *, uint32_t v, __kernel_size_t);
+static inline void *memset32(uint32_t *p, uint32_t v, __kernel_size_t n)
+{
quoted
+	return __memset32(p, v, n * 4);
+}
+
+extern void *__memset64(uint64_t *, uint64_t v, __kernel_size_t);
+static inline void *memset64(uint64_t *p, uint64_t v, __kernel_size_t n)
+{
quoted
+	return __memset64(p, v, n * 8);
+}
+
 #endif /* __KERNEL__ */
 
quoted
 #endif	/* _ASM_POWERPC_STRING_H */
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help