Thread (22 messages) 22 messages, 8 authors, 2021-08-26

Re: [PATCH] lib/string: Bring optimized memcmp from glibc

From: Linus Torvalds <torvalds@linux-foundation.org>
Date: 2021-07-21 18:45:58
Also in: lkml

On Wed, Jul 21, 2021 at 11:17 AM Nikolay Borisov [off-list ref] wrote:
I find it somewhat arbitrary that we choose to align the 2nd pointer and
not the first.
Yeah, that's a bit odd, but I don't think it matters.

The hope is obviously that they are mutually aligned, and in that case
it doesn't matter which one you aim to align.
So you are saying that the current memcmp could indeed use improvement
but you don't want it to be based on the glibc's code due to the ugly
misalignment handling?
Yeah. I suspect that this (very simple) patch gives you the same
performance improvement that the glibc code does.

NOTE! I'm not saying this patch is perfect. This one doesn't even
_try_ to do the mutual alignment, because it's really silly. But I'm
throwing this out here for discussion, because

 - it's really simple

 - I suspect it gets you 99% of the way there

 - the code generation is actually quite good with both gcc and clang.
This is gcc:

        memcmp:
                jmp     .L60
        .L52:
                movq    (%rsi), %rax
                cmpq    %rax, (%rdi)
                jne     .L53
                addq    $8, %rdi
                addq    $8, %rsi
                subq    $8, %rdx
        .L60:
                cmpq    $7, %rdx
                ja      .L52
                testq   %rdx, %rdx
                je      .L61
        .L53:
                xorl    %ecx, %ecx
                jmp     .L56
        .L62:
                addq    $1, %rcx
                cmpq    %rcx, %rdx
                je      .L51
        .L56:
                movzbl  (%rdi,%rcx), %eax
                movzbl  (%rsi,%rcx), %r8d
                subl    %r8d, %eax
                je      .L62
        .L51:
                ret
        .L61:
                xorl    %eax, %eax
                ret

and notice how there are no spills, no extra garbage, just simple and
straightforward code.

Those things ends mattering too - it's good for I$, it's good for the
small cases, and it's good for debugging and reading the code.

If this is "good enough" for your test-case, I really would prefer
something like this. "Make it as simple as possible, but no simpler"

I can do the mutual alignment too, but I'd actually prefer to do it as
a separate patch, for when there are numbers for that.

And I wouldn't do it as a byte-by-byte case, because that's just stupid.

I'd do it using a separate first single "get unaligned word from both
sources, compare them for equality, and then only add enough bytes to
align"

                  Linus

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help