Re: [PATCH] sched/membarrier: Fix redundant load of membarrier_state
From: Segher Boessenkool <hidden>
Date: 2024-10-25 03:36:14
Also in:
lkml, llvm
Hi! On Fri, Oct 25, 2024 at 11:29:38AM +1100, Michael Ellerman wrote:
[To += Mathieu] "Nysal Jan K.A." [off-list ref] writes:quoted
From: "Nysal Jan K.A" <redacted> On architectures where ARCH_HAS_SYNC_CORE_BEFORE_USERMODE is not selected, sync_core_before_usermode() is a no-op. In membarrier_mm_sync_core_before_usermode() the compiler does not eliminate redundant branches and the load of mm->membarrier_state for this case as the atomic_read() cannot be optimized away.I was wondering if this was caused by powerpc's arch_atomic_read() which uses asm volatile. But replacing arch_atomic_read() with READ_ONCE() makes no difference, presumably because the compiler still can't see that the READ_ONCE() is unnecessary (which is kind of by design).
Exactly.
quoted
GCC 12.2.1: ----------- add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-32 (-32) Function old new delta finish_task_switch.isra 852 820 -32GCC 12 is a couple of years old, I assume GCC 14 behaves similarly?
GCC 12 is still being actively developed. There will be a 12.5 probably halfway next year (and that will be the last 12.x release, yes). The GCC homepage (<https://gcc.gnu.org>) will tell you what releases are still maintained/supported, and sometimes you can derive our planned plans from there as well :-) But yes, 14 is similar (I did not test, but I feel confident making that assertion :-) )
quoted
static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm) { + if (!IS_ENABLED(CONFIG_ARCH_HAS_SYNC_CORE_BEFORE_USERMODE)) + return; if (current->mm != mm) return; if (likely(!(atomic_read(&mm->membarrier_state) &The other option would be to have a completely separate stub, eg: #ifdef CONFIG_ARCH_HAS_SYNC_CORE_BEFORE_USERMODE static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm) { if (current->mm != mm) return; if (likely(!(atomic_read(&mm->membarrier_state) & MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE))) return; sync_core_before_usermode(); } #else static inline void membarrier_mm_sync_core_before_usermode(struct mm_struct *mm) { } #endif Not sure what folks prefer. In either case I think it's probably worth a short comment explaining why it's worth the trouble (ie. that the atomic_read() prevents the compiler from doing DCE).
Since you ask, I like the proposed change (the inline one) best. But yeah, comment please! (And it is not about DCE -- just the definition of __READ_ONCE makes it directly impossible to CSE any expressions with this, it (standards- violatingly) casts the pointers to pointers to volatile, and you cannot CSE any accesses to volatile objects!) So what are the actual semantics the kernel wants from its READ_ONCE, and from its atomics in general? GCC has perfectly fine in-compiler support for such things, there is no need for making a second rate manual implementation of parts of this, when you can use a good implementation of everything instead! Segher