Re: [PATCH] lib/string: Bring optimized memcmp from glibc
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: 2021-07-21 18:45:58
Also in:
lkml
On Wed, Jul 21, 2021 at 11:17 AM Nikolay Borisov [off-list ref] wrote:
I find it somewhat arbitrary that we choose to align the 2nd pointer and not the first.
Yeah, that's a bit odd, but I don't think it matters. The hope is obviously that they are mutually aligned, and in that case it doesn't matter which one you aim to align.
So you are saying that the current memcmp could indeed use improvement but you don't want it to be based on the glibc's code due to the ugly misalignment handling?
Yeah. I suspect that this (very simple) patch gives you the same
performance improvement that the glibc code does.
NOTE! I'm not saying this patch is perfect. This one doesn't even
_try_ to do the mutual alignment, because it's really silly. But I'm
throwing this out here for discussion, because
- it's really simple
- I suspect it gets you 99% of the way there
- the code generation is actually quite good with both gcc and clang.
This is gcc:
memcmp:
jmp .L60
.L52:
movq (%rsi), %rax
cmpq %rax, (%rdi)
jne .L53
addq $8, %rdi
addq $8, %rsi
subq $8, %rdx
.L60:
cmpq $7, %rdx
ja .L52
testq %rdx, %rdx
je .L61
.L53:
xorl %ecx, %ecx
jmp .L56
.L62:
addq $1, %rcx
cmpq %rcx, %rdx
je .L51
.L56:
movzbl (%rdi,%rcx), %eax
movzbl (%rsi,%rcx), %r8d
subl %r8d, %eax
je .L62
.L51:
ret
.L61:
xorl %eax, %eax
ret
and notice how there are no spills, no extra garbage, just simple and
straightforward code.
Those things ends mattering too - it's good for I$, it's good for the
small cases, and it's good for debugging and reading the code.
If this is "good enough" for your test-case, I really would prefer
something like this. "Make it as simple as possible, but no simpler"
I can do the mutual alignment too, but I'd actually prefer to do it as
a separate patch, for when there are numbers for that.
And I wouldn't do it as a byte-by-byte case, because that's just stupid.
I'd do it using a separate first single "get unaligned word from both
sources, compare them for equality, and then only add enough bytes to
align"
Linus Attachments
- patch.diff [text/x-patch] 975 bytes · preview