Re: [PATCH 1/3] riscv: optimized memcpy

[PATCH 0/3] riscv: optimized mem* functions · Matteo Croce <hidden> · 2021-06-15
[PATCH 1/3] riscv: optimized memcpy · Matteo Croce <hidden> · 2021-06-15
RE: [PATCH 1/3] riscv: optimized memcpy · David Laight <hidden> · 2021-06-15
Re: [PATCH 1/3] riscv: optimized memcpy · Bin Meng <hidden> · 2021-06-15
RE: [PATCH 1/3] riscv: optimized memcpy · David Laight <hidden> · 2021-06-15
Re: [PATCH 1/3] riscv: optimized memcpy · Bin Meng <hidden> · 2021-06-15
Re: [PATCH 1/3] riscv: optimized memcpy · Emil Renner Berthing <kernel@esmil.dk> · 2021-06-15
Re: [PATCH 1/3] riscv: optimized memcpy · Bin Meng <hidden> · 2021-06-16
Re: [PATCH 1/3] riscv: optimized memcpy · Matteo Croce <hidden> · 2021-06-16
RE: [PATCH 1/3] riscv: optimized memcpy · David Laight <hidden> · 2021-06-16
Re: [PATCH 1/3] riscv: optimized memcpy · Akira Tsukamoto <hidden> · 2021-06-16
Re: [PATCH 1/3] riscv: optimized memcpy · Matteo Croce <hidden> · 2021-06-16
Re: [PATCH 1/3] riscv: optimized memcpy · Matteo Croce <hidden> · 2021-06-15
Re: [PATCH 1/3] riscv: optimized memcpy · Guo Ren <guoren@kernel.org> · 2021-06-16
Re: [PATCH 1/3] riscv: optimized memcpy · Matteo Croce <hidden> · 2021-06-16
RE: [PATCH 1/3] riscv: optimized memcpy · David Laight <hidden> · 2021-06-17
Re: [PATCH 1/3] riscv: optimized memcpy · Matteo Croce <hidden> · 2021-06-17
Re: [PATCH 1/3] riscv: optimized memcpy · Matteo Croce <hidden> · 2021-06-18
Re: [PATCH 1/3] riscv: optimized memcpy · Matteo Croce <hidden> · 2021-06-18
RE: [PATCH 1/3] riscv: optimized memcpy · David Laight <hidden> · 2021-06-18
[PATCH 2/3] riscv: optimized memmove · Matteo Croce <hidden> · 2021-06-15
[PATCH 3/3] riscv: optimized memset · Matteo Croce <hidden> · 2021-06-15
Re: [PATCH 0/3] riscv: optimized mem* functions · Bin Meng <hidden> · 2021-06-15

From: Bin Meng <hidden>
Date: 2021-06-15 13:29:37
Also in: linux-riscv, lkml

On Tue, Jun 15, 2021 at 9:18 PM David Laight [off-list ref] wrote:

From: Bin Meng

quoted

Sent: 15 June 2021 14:09

On Tue, Jun 15, 2021 at 4:57 PM David Laight [off-list ref] wrote:

quoted

...

quoted

I'm surprised that the C loop:

quoted

+             for (; count >= bytes_long; count -= bytes_long)
+                     *d.ulong++ = *s.ulong++;

ends up being faster than the ASM 'read lots' - 'write lots' loop.

I believe that's because the assembly version has some unaligned
access cases, which end up being trap-n-emulated in the OpenSBI
firmware, and that is a big overhead.

Ah, that would make sense since the asm user copy code
was broken for misaligned copies.
I suspect memcpy() was broken the same way.

Yes, Gary Guo sent one patch long time ago against the broken assembly
version, but that patch was still not applied as of today.
https://patchwork.kernel.org/project/linux-riscv/patch/20210216225555.4976-1-gary@garyguo.net/

I suggest Matteo re-test using Gary's version.

I'm surprised IP_NET_ALIGN isn't set to 2 to try to
avoid all these misaligned copies in the network stack.
Although avoiding 8n+4 aligned data is rather harder.

Misaligned copies are just best avoided - really even on x86.
The 'real fun' is when the access crosses TLB boundaries.

Regards,
Bin

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help