Thread (23 messages) 23 messages, 6 authors, 2021-06-18

Re: [PATCH 1/3] riscv: optimized memcpy

From: Bin Meng <hidden>
Date: 2021-06-15 13:29:37
Also in: linux-riscv, lkml

On Tue, Jun 15, 2021 at 9:18 PM David Laight [off-list ref] wrote:
From: Bin Meng
quoted
Sent: 15 June 2021 14:09

On Tue, Jun 15, 2021 at 4:57 PM David Laight [off-list ref] wrote:
quoted
...
quoted
quoted
I'm surprised that the C loop:
quoted
+             for (; count >= bytes_long; count -= bytes_long)
+                     *d.ulong++ = *s.ulong++;
ends up being faster than the ASM 'read lots' - 'write lots' loop.
I believe that's because the assembly version has some unaligned
access cases, which end up being trap-n-emulated in the OpenSBI
firmware, and that is a big overhead.
Ah, that would make sense since the asm user copy code
was broken for misaligned copies.
I suspect memcpy() was broken the same way.
Yes, Gary Guo sent one patch long time ago against the broken assembly
version, but that patch was still not applied as of today.
https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

I suggest Matteo re-test using Gary's version.
I'm surprised IP_NET_ALIGN isn't set to 2 to try to
avoid all these misaligned copies in the network stack.
Although avoiding 8n+4 aligned data is rather harder.

Misaligned copies are just best avoided - really even on x86.
The 'real fun' is when the access crosses TLB boundaries.
Regards,
Bin
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help