Thread (96 messages) 96 messages, 15 authors, 2013-11-08

Re: [RFC] arch: Introduce new TSO memory barrier smp_tmb()

From: Will Deacon <hidden>
Date: 2013-11-05 14:05:59
Also in: lkml

On Mon, Nov 04, 2013 at 08:53:44PM +0000, Paul E. McKenney wrote:
On Mon, Nov 04, 2013 at 08:11:27PM +0100, Peter Zijlstra wrote:
Some comments below.  I believe that opcodes need to be fixed for IA64.
I am unsure of the ifdefs and opcodes for arm64, but the ARM folks should
be able to tell us.
[...]
quoted
diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h
index 60f15e274e6d..a804093d6891 100644
--- a/arch/arm/include/asm/barrier.h
+++ b/arch/arm/include/asm/barrier.h
@@ -53,10 +53,36 @@
 #define smp_mb()     barrier()
 #define smp_rmb()    barrier()
 #define smp_wmb()    barrier()
+
+#define smp_store_release(p, v)                                              \
+do {                                                                 \
+     smp_mb();                                                       \
+     ACCESS_ONCE(p) = (v);                                           \
+} while (0)
+
+#define smp_load_acquire(p, v)                                               \
+do {                                                                 \
+     typeof(p) ___p1 = ACCESS_ONCE(p);                               \
+     smp_mb();                                                       \
+     return ___p1;                                                   \
+} while (0)
What data sizes do these accessors operate on? Assuming that we want
single-copy atomicity (with respect to interrupts in the UP case), we
probably want a check to stop people passing in things like structs.
quoted
 #else
 #define smp_mb()     dmb(ish)
 #define smp_rmb()    smp_mb()
 #define smp_wmb()    dmb(ishst)
+
Seems like there should be some sort of #ifdef condition to distinguish
between these.  My guess is something like:

#if __LINUX_ARM_ARCH__ > 7

But I must defer to the ARM guys.  For all I know, they might prefer
arch/arm to stick with smp_mb() and have arch/arm64 do the ldar and stlr.
Yes. For arch/arm/, I'd rather we stick with the smp_mb() for the time
being. We don't (yet) have any 32-bit ARMv8 support, and the efforts towards
a single zImage could do without minor variations like this, not to mention
the usual backlash I get whenever introducing something that needs a
relatively recent binutils.
quoted
+#define smp_store_release(p, v)                                              \
+do {                                                                 \
+     asm volatile ("stlr %w0 [%1]" : : "r" (v), "r" (&p) : "memory");\
+} while (0)
+
+#define smp_load_acquire(p)                                          \
+do {                                                                 \
+     typeof(p) ___p1;                                                \
+     asm volatile ("ldar %w0, [%1]"                                  \
+                     : "=r" (___p1) : "r" (&p) : "memory");          \
+     return ___p1;                                                   \
+} while (0)
 #endif

 #define read_barrier_depends()               do { } while(0)
diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index d4a63338a53c..0da2d4ebb9a8 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -35,10 +35,38 @@
 #define smp_mb()     barrier()
 #define smp_rmb()    barrier()
 #define smp_wmb()    barrier()
+
+#define smp_store_release(p, v)                                              \
+do {                                                                 \
+     smp_mb();                                                       \
+     ACCESS_ONCE(p) = (v);                                           \
+} while (0)
+
+#define smp_load_acquire(p, v)                                               \
+do {                                                                 \
+     typeof(p) ___p1 = ACCESS_ONCE(p);                               \
+     smp_mb();                                                       \
+     return ___p1;                                                   \
+} while (0)
+
 #else
+
 #define smp_mb()     asm volatile("dmb ish" : : : "memory")
 #define smp_rmb()    asm volatile("dmb ishld" : : : "memory")
 #define smp_wmb()    asm volatile("dmb ishst" : : : "memory")
+
+#define smp_store_release(p, v)                                              \
+do {                                                                 \
+     asm volatile ("stlr %w0 [%1]" : : "r" (v), "r" (&p) : "memory");\
Missing comma between the operands. Also, that 'w' output modifier enforces
a 32-bit store (same early question about sizes). Finally, it might be more
efficient to use "=Q" for the addressing mode, rather than take the address
of p manually.
quoted
+} while (0)
+
+#define smp_load_acquire(p)                                          \
+do {                                                                 \
+     typeof(p) ___p1;                                                \
+     asm volatile ("ldar %w0, [%1]"                                  \
+                     : "=r" (___p1) : "r" (&p) : "memory");          \
+     return ___p1;                                                   \
Similar comments here wrt Q constraint.

Random other question: have you considered how these accessors should behave
when presented with __iomem pointers?

Will
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help