Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers
From: Arnd Bergmann <arnd@kernel.org>
Date: 2021-05-18 15:42:18
Also in:
linux-arch, linux-arm-kernel, linux-crypto, lkml
On Tue, May 18, 2021 at 4:56 PM Linus Torvalds [off-list ref] wrote:
On Tue, May 18, 2021 at 12:27 AM Arnd Bergmann [off-list ref] wrote:quoted
quoted
I wonder if the kernel should do the same, or whether there are still cases where memcpy() isn't compiled optimally. armv6/7 used to be one such case, but it was fixed in gcc 6.It would have to be memmove(), not memcpy() in this case, right?No, it would simply be something like #define __get_unaligned_t(type, ptr) \ ({ type __val; memcpy(&__val, ptr, sizeof(type)); __val; }) #define get_unaligned(ptr) \ __get_unaligned_t(typeof(*(ptr)), ptr) but honestly, the likelihood that the compiler generates something horrible (possibly because of KASAN etc) is uncomfortably high. I'd prefer the __packed thing. We don't actually use -O3, and it's considered a bad idea, and the gcc bug is as such less likely than just the above generating unacceptable code (we have several cases where "bad code generation" ends up being an actual bug, since we depend on inlining and depend on some code sequences not generating calls etc).
I think the important question is whether we know that the bug that Eric pointed to can only happen with -O3, or whether it is something in gcc-10.1 that got triggered by -O3 -msse on x86-64 but could equally well get triggered on some other architecture without -O3 or vector instructions enabled. From the gcc fix at https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=9fa5b473b5b8e289b it looks like this code path is entered when compiling with -ftree-loop-vectorize, which is documented as '-ftree-loop-vectorize' Perform loop vectorization on trees. This flag is enabled by default at '-O3' and by '-ftree-vectorize', '-fprofile-use', and '-fauto-profile'. -ftree-vectorize is set in arch/arm/lib/xor-neon.c -O3 is set for the lz4 and zstd compression helpers and for wireguard. To be on the safe side, we could pass -fno-tree-loop-vectorize along with -O3 on the affected gcc versions, or use a bigger hammer (not use -O3 at all, always set -fno-tree-loop-vectorize, ...). Arnd