Re: [PATCH v8 04/12] arm64: support WFET in smp_cond_relaxed_timeout()
From: Will Deacon <will@kernel.org>
Date: 2026-01-09 14:16:24
Also in:
bpf, linux-arch, linux-pm, lkml
On Fri, Jan 09, 2026 at 01:05:57AM -0800, Ankur Arora wrote:
Will Deacon [off-list ref] writes:quoted
On Sun, Dec 14, 2025 at 08:49:11PM -0800, Ankur Arora wrote:quoted
+#define __CMPWAIT_CASE(w, sfx, sz) \ +static inline void __cmpwait_case_##sz(volatile void *ptr, \ + unsigned long val, \ + s64 timeout_ns) \ +{ \ + unsigned long tmp; \ + \ + if (!alternative_has_cap_unlikely(ARM64_HAS_WFXT) || timeout_ns <= 0) { \ + asm volatile( \ + " sevl\n" \ + " wfe\n" \ + " ldxr" #sfx "\t%" #w "[tmp], %[v]\n" \ + " eor %" #w "[tmp], %" #w "[tmp], %" #w "[val]\n" \ + " cbnz %" #w "[tmp], 1f\n" \ + " wfe\n" \ + "1:" \ + : [tmp] "=&r" (tmp), [v] "+Q" (*(u##sz *)ptr) \ + : [val] "r" (val)); \ + } else { \ + u64 ecycles = arch_timer_read_counter() + \ + NSECS_TO_CYCLES(timeout_ns); \ + asm volatile( \ + " sevl\n" \ + " wfe\n" \ + " ldxr" #sfx "\t%" #w "[tmp], %[v]\n" \ + " eor %" #w "[tmp], %" #w "[tmp], %" #w "[val]\n" \ + " cbnz %" #w "[tmp], 2f\n" \ + " msr s0_3_c1_c0_0, %[ecycles]\n" \ + "2:" \ + : [tmp] "=&r" (tmp), [v] "+Q" (*(u##sz *)ptr) \ + : [val] "r" (val), [ecycles] "r" (ecycles)); \ + } \Why not have a separate helper for the WFXT version and avoid the runtime check on timeout_ns?My main reason for keeping them together was that a separate helper needed duplication of a lot of the __CMPWAIT_CASE and __CMPWAIT_GEN stuff. Relooking at it, I think we could get by without duplicating the __CMPWAIT_GEN (the WFE helper just needs to take an unused timeout_ns paramter). But, it seems overkill to get rid of the unnecessary check on timeout_ns (which AFAICT should be well predicted) and the duplicate static branch.
tbh, I was actually struggling to see what the check achieves. In fact, why is 'timeout_ns' signed in the first place? Has BPF invented time travel now? :p If the requested timeout is 0 then we should return immediately (or issue a WFET which will wake up immediately). If the requested timeout is negative, then WFET may end up with a really long timeout, but that should still be no worse than a plain WFE. If the check is only there to de-multiplex __cmpwait() vs __cmpwait_relaxed_timeout() as the caller, then I think we should just avoid muxing them in the first place. The duplication argument doesn't hold a lot of water when the asm block is already mostly copy-paste. Will