Thread (22 messages) 22 messages, 8 authors, 2021-08-26

Re: [PATCH] lib/string: Bring optimized memcmp from glibc

From: Nikolay Borisov <hidden>
Date: 2021-07-22 05:54:20
Also in: lkml


On 21.07.21 г. 23:27, Linus Torvalds wrote:
On Wed, Jul 21, 2021 at 1:13 PM David Sterba [off-list ref] wrote:
quoted
adding a memcmp_large that compares by native words or u64 could be
the best option.
Yeah, we could just special-case that one place.
This who thread started because I first implemented a special case just
for dedupe and Dave Chinner suggested instead of playing whack-a-mole to
get something decent for the generic memcmp so that we get an
improvement across the whole of the kernel.
But see the patches I sent out - I think we can get the best of both worlds.

A small and simple memcmp() that is good enough and not the
_completely_ stupid thing we have now.

The second patch I sent out even gets the mutually aligned case right.

Of course, the glibc code also ended up unrolling things a bit, but
honestly, the way it did it was too disgusting for words.

And if it really turns out that the unrolling makes a big difference -
although I doubt it's meaningful with any modern core - I can add a
couple of lines to that simple patch I sent out to do that too.
Without getting the monster that is that glibc code.

Of course, my patch depends on the fact that "get_unaligned()" is
cheap on all CPU's that really matter, and that caches aren't
direct-mapped any more. The glibc code seems to be written for a world
where registers are cheap, unaligned accesses are prohibitively
expensive, and unrolling helps because L1 caches are direct-mapped and
you really want to do chunking to not get silly way conflicts.

If old-style Sparc or MIPS was our primary target, that would be one
thing. But it really isn't.

              Linus
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help