Re: [PATCH 2/3] arm64: Optimize __READ_ONCE() with CONFIG_LTO=y
From: Marco Elver <elver@google.com>
Date: 2026-01-26 23:16:15
Also in:
lkml, llvm
On Mon, 26 Jan 2026 at 12:16, David Laight [off-list ref] wrote:
On Mon, 26 Jan 2026 01:25:11 +0100 Marco Elver [off-list ref] wrote:quoted
Rework arm64 LTO __READ_ONCE() to improve code generation as follows: 1. Replace the _Generic-based __unqual_scalar_typeof() with the builtin typeof_unqual(). This strips qualifiers from all types, not just integer types, which is required to be able to assign (must be non-const) to __u.__val in the non-atomic case (required for #2). One subtle point here is that non-integer types of __val could be const or volatile within the union with the old __unqual_scalar_typeof(), if the passed variable is const or volatile. This would then result in a forced load from the stack if __u.__val is volatile; in the case of const, it does look odd if the underlying storage changes, but the compiler is told said member is "const" -- it smells like UB. 2. Eliminate the atomic flag and ternary conditional expression. Move the fallback volatile load into the default case of the switch, ensuring __u is unconditionally initialized across all paths. The statement expression now unconditionally returns __u.__val.Does it even need to be a union? I think (eg): TYPEOF_UNQUAL(*__x) __val; \ ... : "=r" (*(__u32 *)&__val) \ will have the same effect (might need an __force for sparse).
Unsure, but we might be treading on UB even with -fno-strict-aliasing given all the inline asm around here.
Also is the 'default' branch even needed? READ_ONCE() rejects sizes other than 1, 2, 4 and 8. A quick search only found one oversize read - for 'struct vcpu_runstate_info' in arch/x86/kvm/xen.c Requiring that code use a different define might make sense. I also did some x86-64 build timings with compiletime_assert_rwonce_type() commented out. Expanding and compiling that check seems to add just over 1% to the build time. So anything to shrink that define is likely to be noticeable.
The compiletime_assert_rwonce_type() is for the benefit of the asm-generic variant, which is implemented like the 'default' case here by default. This here is only the arm64 override of all that with LTO.