Re: [RFC] arch: Introduce new TSO memory barrier smp_tmb()
From: Will Deacon <hidden>
Date: 2013-11-05 14:05:59
Also in:
lkml
On Mon, Nov 04, 2013 at 08:53:44PM +0000, Paul E. McKenney wrote:
On Mon, Nov 04, 2013 at 08:11:27PM +0100, Peter Zijlstra wrote: Some comments below. I believe that opcodes need to be fixed for IA64. I am unsure of the ifdefs and opcodes for arm64, but the ARM folks should be able to tell us.
[...]
quoted
diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h index 60f15e274e6d..a804093d6891 100644 --- a/arch/arm/include/asm/barrier.h +++ b/arch/arm/include/asm/barrier.h@@ -53,10 +53,36 @@ #define smp_mb() barrier() #define smp_rmb() barrier() #define smp_wmb() barrier() + +#define smp_store_release(p, v) \ +do { \ + smp_mb(); \ + ACCESS_ONCE(p) = (v); \ +} while (0) + +#define smp_load_acquire(p, v) \ +do { \ + typeof(p) ___p1 = ACCESS_ONCE(p); \ + smp_mb(); \ + return ___p1; \ +} while (0)
What data sizes do these accessors operate on? Assuming that we want single-copy atomicity (with respect to interrupts in the UP case), we probably want a check to stop people passing in things like structs.
quoted
#else #define smp_mb() dmb(ish) #define smp_rmb() smp_mb() #define smp_wmb() dmb(ishst) +Seems like there should be some sort of #ifdef condition to distinguish between these. My guess is something like: #if __LINUX_ARM_ARCH__ > 7 But I must defer to the ARM guys. For all I know, they might prefer arch/arm to stick with smp_mb() and have arch/arm64 do the ldar and stlr.
Yes. For arch/arm/, I'd rather we stick with the smp_mb() for the time being. We don't (yet) have any 32-bit ARMv8 support, and the efforts towards a single zImage could do without minor variations like this, not to mention the usual backlash I get whenever introducing something that needs a relatively recent binutils.
quoted
+#define smp_store_release(p, v) \ +do { \ + asm volatile ("stlr %w0 [%1]" : : "r" (v), "r" (&p) : "memory");\ +} while (0) + +#define smp_load_acquire(p) \ +do { \ + typeof(p) ___p1; \ + asm volatile ("ldar %w0, [%1]" \ + : "=r" (___p1) : "r" (&p) : "memory"); \ + return ___p1; \ +} while (0) #endif #define read_barrier_depends() do { } while(0)diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h index d4a63338a53c..0da2d4ebb9a8 100644 --- a/arch/arm64/include/asm/barrier.h +++ b/arch/arm64/include/asm/barrier.h@@ -35,10 +35,38 @@ #define smp_mb() barrier() #define smp_rmb() barrier() #define smp_wmb() barrier() + +#define smp_store_release(p, v) \ +do { \ + smp_mb(); \ + ACCESS_ONCE(p) = (v); \ +} while (0) + +#define smp_load_acquire(p, v) \ +do { \ + typeof(p) ___p1 = ACCESS_ONCE(p); \ + smp_mb(); \ + return ___p1; \ +} while (0) + #else + #define smp_mb() asm volatile("dmb ish" : : : "memory") #define smp_rmb() asm volatile("dmb ishld" : : : "memory") #define smp_wmb() asm volatile("dmb ishst" : : : "memory") + +#define smp_store_release(p, v) \ +do { \ + asm volatile ("stlr %w0 [%1]" : : "r" (v), "r" (&p) : "memory");\
Missing comma between the operands. Also, that 'w' output modifier enforces a 32-bit store (same early question about sizes). Finally, it might be more efficient to use "=Q" for the addressing mode, rather than take the address of p manually.
quoted
+} while (0) + +#define smp_load_acquire(p) \ +do { \ + typeof(p) ___p1; \ + asm volatile ("ldar %w0, [%1]" \ + : "=r" (___p1) : "r" (&p) : "memory"); \ + return ___p1; \
Similar comments here wrt Q constraint. Random other question: have you considered how these accessors should behave when presented with __iomem pointers? Will