Thread (121 messages) 121 messages, 13 authors, 2021-09-24

Re: [RFC] LKMM: Add volatile_if()

From: Will Deacon <will@kernel.org>
Date: 2021-06-04 10:44:08
Also in: linux-toolchains, lkml

On Fri, Jun 04, 2021 at 12:12:07PM +0200, Peter Zijlstra wrote:
With optimizing compilers becoming more and more agressive and C so far
refusing to acknowledge the concept of control-dependencies even while
we keep growing the amount of reliance on them, things will eventually
come apart.

There have been talks with toolchain people on how to resolve this; one
suggestion was allowing the volatile qualifier on branch statements like
'if', but so far no actual compiler has made any progress on this.

Rather than waiting any longer, provide our own construct based on that
suggestion. The idea is by Alan Stern and refined by Paul and myself.

Code generation is sub-optimal (for the weak architectures) since we're
forced to convert the condition into another and use a fixed conditional
branch instruction, but shouldn't be too bad.

Usage of volatile_if requires the @cond to be headed by a volatile load
(READ_ONCE() / atomic_read() etc..) such that the compiler is forced to
emit the load and the branch emitted will have the required
data-dependency. Furthermore, volatile_if() is a compiler barrier, which
should prohibit the compiler from lifting anything out of the selection
statement.
When building with LTO on arm64, we already upgrade READ_ONCE() to an RCpc
acquire. In this case, it would be really good to avoid having the dummy
conditional branch somehow, but I can't see a good way to achieve that.
quoted hunk ↗ jump to hunk
This construct should place control dependencies on a stronger footing
until such time that the compiler folks get around to accepting them :-)

I've converted most architectures we care about, and the rest will get
an extra smp_mb() by means of the 'generic' fallback implementation (for
now).

I've converted the control dependencies I remembered and those found
with a search for smp_acquire__after_ctrl_dep(), there might be more.

Compile tested only (alpha, arm, arm64, x86_64, powerpc, powerpc64, s390
and sparc64).

Suggested-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arm/include/asm/barrier.h      | 11 +++++++++++
 arch/arm64/include/asm/barrier.h    | 11 +++++++++++
 arch/powerpc/include/asm/barrier.h  | 13 +++++++++++++
 arch/s390/include/asm/barrier.h     |  3 +++
 arch/sparc/include/asm/barrier_64.h |  3 +++
 arch/x86/include/asm/barrier.h      | 16 ++++++++++++++++
 include/asm-generic/barrier.h       | 38 ++++++++++++++++++++++++++++++++++++-
 include/linux/refcount.h            |  2 +-
 ipc/mqueue.c                        |  2 +-
 ipc/msg.c                           |  2 +-
 kernel/events/ring_buffer.c         |  8 ++++----
 kernel/locking/rwsem.c              |  4 ++--
 kernel/sched/core.c                 |  2 +-
 kernel/smp.c                        |  2 +-
 14 files changed, 105 insertions(+), 12 deletions(-)
diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h
index 83ae97c049d9..de8a61479268 100644
--- a/arch/arm/include/asm/barrier.h
+++ b/arch/arm/include/asm/barrier.h
@@ -97,6 +97,17 @@ static inline unsigned long array_index_mask_nospec(unsigned long idx,
 #define array_index_mask_nospec array_index_mask_nospec
 #endif
 
+/* Guarantee a conditional branch that depends on @cond. */
+static __always_inline _Bool volatile_cond(_Bool cond)
+{
+	asm_volatile_goto("teq %0, #0; bne %l[l_yes]"
+			  : : "r" (cond) : "cc", "memory" : l_yes);
+	return 0;
+l_yes:
+	return 1;
+}
+#define volatile_cond volatile_cond
+
 #include <asm-generic/barrier.h>
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index 451e11e5fd23..2782a7013615 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -156,6 +156,17 @@ do {									\
 	(typeof(*p))__u.__val;						\
 })
 
+/* Guarantee a conditional branch that depends on @cond. */
+static __always_inline _Bool volatile_cond(_Bool cond)
Is _Bool to fix some awful header mess?
+{
+	asm_volatile_goto("cbnz %0, %l[l_yes]"
+			  : : "r" (cond) : "cc", "memory" : l_yes);
+	return 0;
+l_yes:
+	return 1;
+}
nit: you don't need the "cc" clobber here.
quoted hunk ↗ jump to hunk
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 640f09479bdf..a84833f1397b 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -187,6 +187,42 @@ do {									\
 #define virt_store_release(p, v) __smp_store_release(p, v)
 #define virt_load_acquire(p) __smp_load_acquire(p)
 
+/*
+ * 'Generic' wrapper to make volatile_if() below 'work'. Architectures are
+ * encouraged to provide their own implementation. See x86 for TSO and arm64
+ * for a weak example.
+ */
+#ifndef volatile_cond
+#define volatile_cond(cond)	({ bool __t = (cond); smp_mb(); __t; })
+#endif
+
+/**
+ * volatile_if() - Provide a control-dependency
+ *
+ * volatile_if(READ_ONCE(A))
+ *	WRITE_ONCE(B, 1);
+ *
+ * will ensure that the STORE to B happens after the LOAD of A. Normally a
+ * control dependency relies on a conditional branch having a data dependency
+ * on the LOAD and an architecture's inability to speculate STOREs. IOW, this
+ * provides a LOAD->STORE order.
+ *
+ * Due to optimizing compilers extra care is needed; as per the example above
+ * the LOAD must be 'volatile' qualified in order to ensure the compiler
+ * actually emits the load, such that the data-dependency to the conditional
+ * branch can be formed.
+ *
+ * Secondly, the compiler must be prohibited from lifting anything out of the
+ * selection statement, as this would obviously also break the ordering.
+ *
+ * Thirdly, and this is the tricky bit, architectures that allow the
+ * LOAD->STORE reorder must ensure the compiler actually emits the conditional
+ * branch instruction, this isn't possible in generic.
+ *
+ * See the volatile_cond() wrapper.
+ */
+#define volatile_if(cond) if (volatile_cond(cond))
The thing I really dislike about this is that, if the compiler _does_
emit a conditional branch for the C 'if', then we get a pair of branch
instructions in close proximity to each other which the predictor is likely
to hate. I wouldn't be surprised if an RCpc acquire heading the dependency
actually performs better on modern arm64 cores in the general case.

So I think that's an argument for doing this in the compiler...

Will
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help