Thread (67 messages) 67 messages, 7 authors, 2018-09-10

Re: [PATCH V3 13/26] csky: Library functions

From: Guo Ren <hidden>
Date: 2018-09-07 05:08:17
Also in: lkml

On Thu, Sep 06, 2018 at 04:24:59PM +0200, Arnd Bergmann wrote:
On Wed, Sep 5, 2018 at 2:08 PM Guo Ren [off-list ref] wrote:
quoted
--- /dev/null
+++ b/arch/csky/abiv1/memset.c
@@ -0,0 +1,38 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (C) 2018 Hangzhou C-SKY Microsystems co.,ltd.
+#include <linux/types.h>
+
+void *memset(void *dest, int c, size_t l)
+{
+       char *d = dest;
+       int ch = c;
+       int tmp;
+
+       if ((long)d & 0x3)
+               while (l--) *d++ = ch;
+       else {
+               ch &= 0xff;
+               tmp = (ch | ch << 8 | ch << 16 | ch << 24);
+
+               while (l >= 16) {
+                       *(((long *)d)) = tmp;
+                       *(((long *)d)+1) = tmp;
+                       *(((long *)d)+2) = tmp;
+                       *(((long *)d)+3) = tmp;
+                       l -= 16;
+                       d += 16;
+               }
+
+               while (l > 3) {
+                       *(((long *)d)) = tmp;
+                       d = d + 4;
+                       l -= 4;
+               }
+
+               while (l) {
+                       *d++ = ch;
+                       l--;
+               }
+       }
+       return dest;
+}
I see that we have a trivial memset() implementation in lib/string.c, but yours
seems to be better optimized. Where did you get it from?
We write it for our ck610 to improve the performance, but I think a lot
of other arch done it in asm style.
Is this a version
that works particularly well on C-Sky, or is this a generic optimized memset
that others could use as well?
We only test it on C-SKY, but I think it will also work better on other
arch CPU than current lib/string.c memset implement.

I see that in lib/string.c:
void *memset(void *s, int c, size_t count)
{
	char *xs = s;

	while (count--)
		*xs++ = c;
	return s;
}
The most problem is "char *xs;" and it will cause "st.b" in asm.
"st.b" is very slow.

Our key improvement is:
quoted
+                       *(((long *)d)) = tmp;
+                       *(((long *)d)+1) = tmp;
+                       *(((long *)d)+2) = tmp;
+                       *(((long *)d)+3) = tmp;
It will cause SOC AXI burst transfer.
In the latter case, we could add it to
lib/string.c and let architectures select it in place of the triivial version.
Good idea.

 Guo Ren
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help