Thread (10 messages) 10 messages, 5 authors, 2021-05-18

Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers

From: Arnd Bergmann <arnd@kernel.org>
Date: 2021-05-18 15:42:18
Also in: linux-arch, linux-arm-kernel, linux-crypto, lkml

On Tue, May 18, 2021 at 4:56 PM Linus Torvalds
[off-list ref] wrote:
On Tue, May 18, 2021 at 12:27 AM Arnd Bergmann [off-list ref] wrote:
quoted
quoted
I wonder if the kernel should do the same, or whether there are still cases
where memcpy() isn't compiled optimally.  armv6/7 used to be one such case, but
it was fixed in gcc 6.
It would have to be memmove(), not memcpy() in this case, right?
No, it would simply be something like

  #define __get_unaligned_t(type, ptr) \
        ({ type __val; memcpy(&__val, ptr, sizeof(type)); __val; })

  #define get_unaligned(ptr) \
        __get_unaligned_t(typeof(*(ptr)), ptr)

but honestly, the likelihood that the compiler generates something
horrible (possibly because of KASAN etc) is uncomfortably high.

I'd prefer the __packed thing. We don't actually use -O3, and it's
considered a bad idea, and the gcc bug is as such less likely than
just  the above generating unacceptable code (we have several cases
where "bad code generation" ends up being an actual bug, since we
depend on inlining and depend on some code sequences not generating
calls etc).
I think the important question is whether we know that the bug that Eric
pointed to can only happen with -O3, or whether it is something in
gcc-10.1 that got triggered by -O3 -msse on x86-64 but could equally
well get triggered on some other architecture without -O3 or
vector instructions enabled.

From the gcc fix at
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=9fa5b473b5b8e289b
it looks like this code path is entered when compiling with
-ftree-loop-vectorize, which is documented as

'-ftree-loop-vectorize'
     Perform loop vectorization on trees.  This flag is enabled by
     default at '-O3' and by '-ftree-vectorize', '-fprofile-use', and
     '-fauto-profile'.

-ftree-vectorize is set in arch/arm/lib/xor-neon.c
-O3 is set for the lz4 and zstd compression helpers and for wireguard.

To be on the safe side, we could pass -fno-tree-loop-vectorize along
with -O3 on the affected gcc versions, or use a bigger hammer
(not use -O3 at all, always set -fno-tree-loop-vectorize, ...).

        Arnd
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help