Thread (15 messages) 15 messages, 3 authors, 2026-01-27

Re: [PATCH 2/3] arm64: Optimize __READ_ONCE() with CONFIG_LTO=y

From: Marco Elver <elver@google.com>
Date: 2026-01-26 23:16:15
Also in: lkml, llvm

On Mon, 26 Jan 2026 at 12:16, David Laight [off-list ref] wrote:
On Mon, 26 Jan 2026 01:25:11 +0100
Marco Elver [off-list ref] wrote:
quoted
Rework arm64 LTO __READ_ONCE() to improve code generation as follows:

1. Replace the _Generic-based __unqual_scalar_typeof() with the builtin
   typeof_unqual(). This strips qualifiers from all types, not just
   integer types, which is required to be able to assign (must be
   non-const) to __u.__val in the non-atomic case (required for #2).

One subtle point here is that non-integer types of __val could be const
or volatile within the union with the old __unqual_scalar_typeof(), if
the passed variable is const or volatile. This would then result in a
forced load from the stack if __u.__val is volatile; in the case of
const, it does look odd if the underlying storage changes, but the
compiler is told said member is "const" -- it smells like UB.

2. Eliminate the atomic flag and ternary conditional expression. Move
   the fallback volatile load into the default case of the switch,
   ensuring __u is unconditionally initialized across all paths.
   The statement expression now unconditionally returns __u.__val.
Does it even need to be a union?
I think (eg):
        TYPEOF_UNQUAL(*__x) __val;      \
        ...
                : "=r" (*(__u32 *)&__val)       \
will have the same effect (might need an __force for sparse).
Unsure, but we might be treading on UB even with -fno-strict-aliasing
given all the inline asm around here.
Also is the 'default' branch even needed?
READ_ONCE() rejects sizes other than 1, 2, 4 and 8.
A quick search only found one oversize read - for 'struct vcpu_runstate_info'
in arch/x86/kvm/xen.c
Requiring that code use a different define might make sense.

I also did some x86-64 build timings with compiletime_assert_rwonce_type()
commented out.
Expanding and compiling that check seems to add just over 1% to the
build time.
So anything to shrink that define is likely to be noticeable.
The compiletime_assert_rwonce_type() is for the benefit of the
asm-generic variant, which is implemented like the 'default' case here
by default. This here is only the arm64 override of all that with LTO.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help