Re: [PATCH] riscv: use the generic string routines

From: Matteo Croce <hidden>
Date: 2021-09-19 19:14:04
Also in: linux-riscv, lkml

On Mon, Sep 13, 2021 at 1:35 PM David Laight [off-list ref] wrote:

quoted

These ended up getting rejected by Linus, so I'm going to hold off on
this for now.  If they're really out of lib/ then I'll take the C
routines in arch/riscv, but either way it's an issue for the next
release.

Agree, we should take the C routine in arch/riscv for common
implementation. If any vendor what custom implementation they could
use the alternative framework in errata for string operations.

I though the asm ones were significantly faster because
they were less affected by read latency.

(But they were horribly broken for misaligned transfers.)

I can get the same exact performance (and a very similar machine code)
in C with this on top of the C memset implementation:

--- a/arch/riscv/lib/string.c
+++ b/arch/riscv/lib/string.c

@@ -112,9 +112,12 @@ EXPORT_SYMBOL(__memmove);
 void *memmove(void *dest, const void *src, size_t count) __weak

__alias(__memmove);
 EXPORT_SYMBOL(memmove);

+#define BATCH 4
+
 void *__memset(void *s, int c, size_t count)
 {
  union types dest = { .as_u8 = s };
+ int i;

  if (count >= MIN_THRESHOLD) {
  unsigned long cu = (unsigned long)c;

@@ -138,8 +141,12 @@ void *__memset(void *s, int c, size_t count)
  }

  /* Copy using the largest size allowed */
- for (; count >= BYTES_LONG; count -= BYTES_LONG)
- *dest.as_ulong++ = cu;
+ for (; count >= BYTES_LONG * BATCH; count -= BYTES_LONG * BATCH) {
+#pragma GCC unroll 4
+     for (i = 0; i < BATCH; i++)
+         dest.as_ulong[i] = cu;
+     dest.as_ulong += BATCH;
+ }
  }

On the BeagleV the memset speed with the different batch size are:

1 (stock): 267 Mb/s
2: 272 Mb/s
4: 276 Mb/s
8: 276 Mb/s

The problem with biggest batch size is that it will fallback to a
single byte copy if the buffers are too small.

Regards,
-- 
per aspera ad upstream

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help