Re: [PATCH] riscv: use the generic string routines
From: Matteo Croce <hidden>
Date: 2021-09-19 19:14:04
Also in:
linux-riscv, lkml
On Mon, Sep 13, 2021 at 1:35 PM David Laight [off-list ref] wrote:
quoted
quoted
These ended up getting rejected by Linus, so I'm going to hold off on this for now. If they're really out of lib/ then I'll take the C routines in arch/riscv, but either way it's an issue for the next release.Agree, we should take the C routine in arch/riscv for common implementation. If any vendor what custom implementation they could use the alternative framework in errata for string operations.I though the asm ones were significantly faster because they were less affected by read latency. (But they were horribly broken for misaligned transfers.)
I can get the same exact performance (and a very similar machine code) in C with this on top of the C memset implementation:
--- a/arch/riscv/lib/string.c
+++ b/arch/riscv/lib/string.c@@ -112,9 +112,12 @@ EXPORT_SYMBOL(__memmove); void *memmove(void *dest, const void *src, size_t count) __weak
__alias(__memmove);
EXPORT_SYMBOL(memmove);
+#define BATCH 4
+
void *__memset(void *s, int c, size_t count)
{
union types dest = { .as_u8 = s };
+ int i;
if (count >= MIN_THRESHOLD) {
unsigned long cu = (unsigned long)c;@@ -138,8 +141,12 @@ void *__memset(void *s, int c, size_t count) } /* Copy using the largest size allowed */ - for (; count >= BYTES_LONG; count -= BYTES_LONG) - *dest.as_ulong++ = cu; + for (; count >= BYTES_LONG * BATCH; count -= BYTES_LONG * BATCH) { +#pragma GCC unroll 4 + for (i = 0; i < BATCH; i++) + dest.as_ulong[i] = cu; + dest.as_ulong += BATCH; + } }
On the BeagleV the memset speed with the different batch size are: 1 (stock): 267 Mb/s 2: 272 Mb/s 4: 276 Mb/s 8: 276 Mb/s The problem with biggest batch size is that it will fallback to a single byte copy if the buffers are too small. Regards, -- per aspera ad upstream