Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers
From: Arnd Bergmann <arnd@kernel.org>
Date: 2021-05-18 07:27:11
Also in:
linux-arm-kernel, linux-crypto, linux-mips, lkml
On Mon, May 17, 2021 at 11:53 PM Eric Biggers [off-list ref] wrote:
On Fri, May 14, 2021 at 12:00:55PM +0200, Arnd Bergmann wrote:quoted
From: Arnd Bergmann <arnd@arndb.de> As found by Vineet Gupta and Linus Torvalds, gcc has somewhat unexpected behavior when faced with overlapping unaligned pointers. The kernel's unaligned/access-ok.h header technically invokes undefined behavior that happens to usually work on the architectures using it, but if the compiler optimizes code based on the assumption that undefined behavior doesn't happen, it can create output that actually causes data corruption. A related problem was previously found on 32-bit ARMv7, where most instructions can be used on unaligned data, but 64-bit ldrd/strd causes an exception. The workaround was to always use the unaligned/le_struct.h helper instead of unaligned/access-ok.h, in commit 1cce91dfc8f7 ("ARM: 8715/1: add a private asm/unaligned.h"). The same solution should work on all other architectures as well, so remove the access-ok.h variant and use the other one unconditionally on all architectures, picking either the big-endian or little-endian version.FYI, gcc 10 had a bug where it miscompiled code that uses "packed structs" to copy between overlapping unaligned pointers (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94994).
Thank you for pointing this out
I'm not sure whether the kernel will run into that or not, and gcc has since fixed it. But it's worth mentioning, especially since the issue mentioned in this commit sounds very similar (overlapping unaligned pointers), and both involved implementations of DEFLATE decompression.
I tried reproducing this on the kernel deflate code with the kernel.org gcc-10.1 and gcc-10.3 crosstool versions but couldn't quite get there with Vineet's preprocessed source https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363 Trying with both the original get_unaligned() version in there and the packed-struct variant, I get the same output from gcc-10.1 and gcc-10.3 when I compile those myself for arc hs4x , but it's rather different from the output that Vineet got and I don't know how to spot whether the problem exists in any of those versions.
Anyway, partly due to the above, in userspace I now only use memcpy() to
implement {get,put}_unaligned_*, since these days it seems to be compiled
optimally and have the least amount of problems.
I wonder if the kernel should do the same, or whether there are still cases
where memcpy() isn't compiled optimally. armv6/7 used to be one such case, but
it was fixed in gcc 6.
It would have to be memmove(), not memcpy() in this case, right?
My feeling is that if gcc-4.9 and gcc-5 produce correct but slightly slower
code, we can live with that, unlike the possibility of gcc-10.{1,2} producing
incorrect code.
Since the new asm/unaligned.h has a single implementation across all
architectures, we could probably fall back to a memmove based version for
the compilers affected by the 94994 bug, but I'd first need to have a better
way to test regarding whether given combination of asm/unaligned.h and
compiler version runs into this bug.
I have checked your reproducer and confirmed that it does affect x86_64
gcc-10.1 -O3 with my proposed version of asm-generic/unaligned.h, but
does not trigger on any other version (4.9 though 9.3, 10.3 or 11.1), and not
on -O2 or "-O3 -mno-sse" builds or on arm64, but that doesn't necessarily
mean it's safe on these.
Arnd