Thread (46 messages) 46 messages, 11 authors, 2021-12-17

Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers

From: Arnd Bergmann <arnd@kernel.org>
Date: 2021-05-18 07:27:11
Also in: linux-arm-kernel, linux-crypto, linux-mips, lkml

On Mon, May 17, 2021 at 11:53 PM Eric Biggers [off-list ref] wrote:
On Fri, May 14, 2021 at 12:00:55PM +0200, Arnd Bergmann wrote:
quoted
From: Arnd Bergmann <arnd@arndb.de>

As found by Vineet Gupta and Linus Torvalds, gcc has somewhat unexpected
behavior when faced with overlapping unaligned pointers. The kernel's
unaligned/access-ok.h header technically invokes undefined behavior
that happens to usually work on the architectures using it, but if the
compiler optimizes code based on the assumption that undefined behavior
doesn't happen, it can create output that actually causes data corruption.

A related problem was previously found on 32-bit ARMv7, where most
instructions can be used on unaligned data, but 64-bit ldrd/strd causes
an exception. The workaround was to always use the unaligned/le_struct.h
helper instead of unaligned/access-ok.h, in commit 1cce91dfc8f7 ("ARM:
8715/1: add a private asm/unaligned.h").

The same solution should work on all other architectures as well, so
remove the access-ok.h variant and use the other one unconditionally on
all architectures, picking either the big-endian or little-endian version.
FYI, gcc 10 had a bug where it miscompiled code that uses "packed structs" to
copy between overlapping unaligned pointers
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94994).
Thank you for pointing this out
I'm not sure whether the kernel will run into that or not, and gcc has since
fixed it.  But it's worth mentioning, especially since the issue mentioned in
this commit sounds very similar (overlapping unaligned pointers), and both
involved implementations of DEFLATE decompression.
I tried reproducing this on the kernel deflate code with the kernel.org gcc-10.1
and gcc-10.3 crosstool versions but couldn't quite get there with Vineet's
preprocessed source https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363

Trying with both the original get_unaligned() version in there and the
packed-struct
variant, I get the same output from gcc-10.1 and gcc-10.3 when I compile those
myself for arc hs4x , but it's rather different from the output that Vineet got
and I don't know how to spot whether the problem exists in any of those
versions.
Anyway, partly due to the above, in userspace I now only use memcpy() to
implement {get,put}_unaligned_*, since these days it seems to be compiled
optimally and have the least amount of problems.

I wonder if the kernel should do the same, or whether there are still cases
where memcpy() isn't compiled optimally.  armv6/7 used to be one such case, but
it was fixed in gcc 6.
It would have to be memmove(), not memcpy() in this case, right?
My feeling is that if gcc-4.9 and gcc-5 produce correct but slightly slower
code, we can live with that, unlike the possibility of gcc-10.{1,2} producing
incorrect code.

Since the new asm/unaligned.h has a single implementation across all
architectures, we could probably fall back to a memmove based version for
the compilers affected by the 94994 bug,  but I'd first need to have a better
way to test regarding whether given combination of asm/unaligned.h and
compiler version runs into this bug.

I have checked your reproducer and confirmed that it does affect x86_64
gcc-10.1 -O3 with my proposed version of asm-generic/unaligned.h, but
does not trigger on any other version (4.9 though 9.3, 10.3 or 11.1), and not
on -O2 or "-O3 -mno-sse" builds or on arm64, but that doesn't necessarily
mean it's safe on these.

        Arnd
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help