Thread (43 messages) 43 messages, 9 authors, 2019-12-19

Re: READ_ONCE() + STACKPROTECTOR_STRONG == :/ (was Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.5-2 tag (topic/kasan-bitops))

From: Peter Zijlstra <peterz@infradead.org>
Date: 2019-12-12 08:01:31
Also in: linux-arch, lkml
Subsystem: atomic infrastructure, generic include/asm header files, the rest · Maintainers: Will Deacon, Peter Zijlstra, Boqun Feng, Arnd Bergmann, Linus Torvalds

On Thu, Dec 12, 2019 at 04:42:13PM +1100, Michael Ellerman wrote:
[ trimmed CC a bit ]

Peter Zijlstra [off-list ref] writes:
quoted
On Fri, Dec 06, 2019 at 11:46:11PM +1100, Michael Ellerman wrote:
...
quoted
you write:

  "Currently bitops-instrumented.h assumes that the architecture provides
atomic, non-atomic and locking bitops (e.g. both set_bit and __set_bit).
This is true on x86 and s390, but is not always true: there is a
generic bitops/non-atomic.h header that provides generic non-atomic
operations, and also a generic bitops/lock.h for locking operations."

Is there any actual benefit for PPC to using their own atomic bitops
over bitops/lock.h ? I'm thinking that the generic code is fairly
optimal for most LL/SC architectures.
Yes and no :)

Some of the generic versions don't generate good code compared to our
versions, but that's because READ_ONCE() is triggering stack protector
to be enabled.
Bah, there's never anything simple, is there :/
For example, comparing an out-of-line copy of the generic and ppc
versions of test_and_set_bit_lock():

   1 <generic_test_and_set_bit_lock>:           1 <ppc_test_and_set_bit_lock>:
   2         addis   r2,r12,361
   3         addi    r2,r2,-4240
   4         stdu    r1,-48(r1)
   5         rlwinm  r8,r3,29,3,28
   6         clrlwi  r10,r3,26                   2         rldicl  r10,r3,58,6
   7         ld      r9,3320(r13)
   8         std     r9,40(r1)
   9         li      r9,0
  10         li      r9,1                        3         li      r9,1
                                                 4         clrlwi  r3,r3,26
                                                 5         rldicr  r10,r10,3,60
  11         sld     r9,r9,r10                   6         sld     r3,r9,r3
  12         add     r10,r4,r8                   7         add     r4,r4,r10
  13         ldx     r8,r4,r8
  14         and.    r8,r9,r8
  15         bne     34f
  16         ldarx   r7,0,r10                    8         ldarx   r9,0,r4,1
  17         or      r8,r9,r7                    9         or      r10,r9,r3
  18         stdcx.  r8,0,r10                   10         stdcx.  r10,0,r4
  19         bne-    16b                        11         bne-    8b
  20         isync                              12         isync
  21         and     r9,r7,r9                   13         and     r3,r3,r9
  22         addic   r7,r9,-1                   14         addic   r9,r3,-1
  23         subfe   r7,r7,r9                   15         subfe   r3,r9,r3
  24         ld      r9,40(r1)
  25         ld      r10,3320(r13)
  26         xor.    r9,r9,r10
  27         li      r10,0
  28         mr      r3,r7
  29         bne     36f
  30         addi    r1,r1,48
  31         blr                                16         blr
  32         nop
  33         nop
  34         li      r7,1
  35         b       24b
  36         mflr    r0
  37         std     r0,64(r1)
  38         bl      <__stack_chk_fail+0x8>


If you squint, the generated code for the actual logic is pretty similar, but
the stack protector gunk makes a big mess. It's particularly bad here
because the ppc version doesn't even need a stack frame.

I've also confirmed that even when test_and_set_bit_lock() is inlined
into an actual call site the stack protector logic still triggers.
If I change the READ_ONCE() in test_and_set_bit_lock():

	if (READ_ONCE(*p) & mask)
		return 1;

to a regular pointer access:

	if (*p & mask)
		return 1;

Then the generated code looks more or less the same, except for the extra early
return in the generic version of test_and_set_bit_lock(), and different handling
of the return code by the compiler.
So given that the function signature is:

static inline int test_and_set_bit_lock(unsigned int nr,
					volatile unsigned long *p)

@p already carries the required volatile qualifier, so READ_ONCE() does
not add anything here (except for easier to read code and poor code
generation).

So your proposed change _should_ be fine. Will, I'm assuming you never
saw this on your ARGH64 builds when you did this code ?

---
diff --git a/include/asm-generic/bitops/atomic.h b/include/asm-generic/bitops/atomic.h
index dd90c9792909..10264e8808f8 100644
--- a/include/asm-generic/bitops/atomic.h
+++ b/include/asm-generic/bitops/atomic.h
@@ -35,7 +35,7 @@ static inline int test_and_set_bit(unsigned int nr, volatile unsigned long *p)
 	unsigned long mask = BIT_MASK(nr);
 
 	p += BIT_WORD(nr);
-	if (READ_ONCE(*p) & mask)
+	if (*p & mask)
 		return 1;
 
 	old = atomic_long_fetch_or(mask, (atomic_long_t *)p);
@@ -48,7 +48,7 @@ static inline int test_and_clear_bit(unsigned int nr, volatile unsigned long *p)
 	unsigned long mask = BIT_MASK(nr);
 
 	p += BIT_WORD(nr);
-	if (!(READ_ONCE(*p) & mask))
+	if (!(*p & mask))
 		return 0;
 
 	old = atomic_long_fetch_andnot(mask, (atomic_long_t *)p);
diff --git a/include/asm-generic/bitops/lock.h b/include/asm-generic/bitops/lock.h
index 3ae021368f48..9baf0a0055b8 100644
--- a/include/asm-generic/bitops/lock.h
+++ b/include/asm-generic/bitops/lock.h
@@ -22,7 +22,7 @@ static inline int test_and_set_bit_lock(unsigned int nr,
 	unsigned long mask = BIT_MASK(nr);
 
 	p += BIT_WORD(nr);
-	if (READ_ONCE(*p) & mask)
+	if (*p & mask)
 		return 1;
 
 	old = atomic_long_fetch_or_acquire(mask, (atomic_long_t *)p);
@@ -60,7 +60,7 @@ static inline void __clear_bit_unlock(unsigned int nr,
 	unsigned long old;
 
 	p += BIT_WORD(nr);
-	old = READ_ONCE(*p);
+	old = *p;
 	old &= ~BIT_MASK(nr);
 	atomic_long_set_release((atomic_long_t *)p, old);
 }
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help