Re: READ_ONCE() + STACKPROTECTOR_STRONG == :/ (was Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.5-2 tag (topic/kasan-bitops))
From: Peter Zijlstra <peterz@infradead.org>
Date: 2019-12-12 08:01:31
Also in:
linux-arch, lkml
Subsystem:
atomic infrastructure, generic include/asm header files, the rest · Maintainers:
Will Deacon, Peter Zijlstra, Boqun Feng, Arnd Bergmann, Linus Torvalds
On Thu, Dec 12, 2019 at 04:42:13PM +1100, Michael Ellerman wrote:
[ trimmed CC a bit ] Peter Zijlstra [off-list ref] writes:quoted
On Fri, Dec 06, 2019 at 11:46:11PM +1100, Michael Ellerman wrote:...quoted
you write: "Currently bitops-instrumented.h assumes that the architecture provides atomic, non-atomic and locking bitops (e.g. both set_bit and __set_bit). This is true on x86 and s390, but is not always true: there is a generic bitops/non-atomic.h header that provides generic non-atomic operations, and also a generic bitops/lock.h for locking operations." Is there any actual benefit for PPC to using their own atomic bitops over bitops/lock.h ? I'm thinking that the generic code is fairly optimal for most LL/SC architectures.Yes and no :) Some of the generic versions don't generate good code compared to our versions, but that's because READ_ONCE() is triggering stack protector to be enabled.
Bah, there's never anything simple, is there :/
For example, comparing an out-of-line copy of the generic and ppc
versions of test_and_set_bit_lock():
1 <generic_test_and_set_bit_lock>: 1 <ppc_test_and_set_bit_lock>:
2 addis r2,r12,361
3 addi r2,r2,-4240
4 stdu r1,-48(r1)
5 rlwinm r8,r3,29,3,28
6 clrlwi r10,r3,26 2 rldicl r10,r3,58,6
7 ld r9,3320(r13)
8 std r9,40(r1)
9 li r9,0
10 li r9,1 3 li r9,1
4 clrlwi r3,r3,26
5 rldicr r10,r10,3,60
11 sld r9,r9,r10 6 sld r3,r9,r3
12 add r10,r4,r8 7 add r4,r4,r10
13 ldx r8,r4,r8
14 and. r8,r9,r8
15 bne 34f
16 ldarx r7,0,r10 8 ldarx r9,0,r4,1
17 or r8,r9,r7 9 or r10,r9,r3
18 stdcx. r8,0,r10 10 stdcx. r10,0,r4
19 bne- 16b 11 bne- 8b
20 isync 12 isync
21 and r9,r7,r9 13 and r3,r3,r9
22 addic r7,r9,-1 14 addic r9,r3,-1
23 subfe r7,r7,r9 15 subfe r3,r9,r3
24 ld r9,40(r1)
25 ld r10,3320(r13)
26 xor. r9,r9,r10
27 li r10,0
28 mr r3,r7
29 bne 36f
30 addi r1,r1,48
31 blr 16 blr
32 nop
33 nop
34 li r7,1
35 b 24b
36 mflr r0
37 std r0,64(r1)
38 bl <__stack_chk_fail+0x8>
If you squint, the generated code for the actual logic is pretty similar, but
the stack protector gunk makes a big mess. It's particularly bad here
because the ppc version doesn't even need a stack frame.
I've also confirmed that even when test_and_set_bit_lock() is inlined
into an actual call site the stack protector logic still triggers.If I change the READ_ONCE() in test_and_set_bit_lock(): if (READ_ONCE(*p) & mask) return 1; to a regular pointer access: if (*p & mask) return 1; Then the generated code looks more or less the same, except for the extra early return in the generic version of test_and_set_bit_lock(), and different handling of the return code by the compiler.
So given that the function signature is: static inline int test_and_set_bit_lock(unsigned int nr, volatile unsigned long *p) @p already carries the required volatile qualifier, so READ_ONCE() does not add anything here (except for easier to read code and poor code generation). So your proposed change _should_ be fine. Will, I'm assuming you never saw this on your ARGH64 builds when you did this code ? ---
diff --git a/include/asm-generic/bitops/atomic.h b/include/asm-generic/bitops/atomic.h
index dd90c9792909..10264e8808f8 100644
--- a/include/asm-generic/bitops/atomic.h
+++ b/include/asm-generic/bitops/atomic.h@@ -35,7 +35,7 @@ static inline int test_and_set_bit(unsigned int nr, volatile unsigned long *p) unsigned long mask = BIT_MASK(nr); p += BIT_WORD(nr); - if (READ_ONCE(*p) & mask) + if (*p & mask) return 1; old = atomic_long_fetch_or(mask, (atomic_long_t *)p);
@@ -48,7 +48,7 @@ static inline int test_and_clear_bit(unsigned int nr, volatile unsigned long *p) unsigned long mask = BIT_MASK(nr); p += BIT_WORD(nr); - if (!(READ_ONCE(*p) & mask)) + if (!(*p & mask)) return 0; old = atomic_long_fetch_andnot(mask, (atomic_long_t *)p);
diff --git a/include/asm-generic/bitops/lock.h b/include/asm-generic/bitops/lock.h
index 3ae021368f48..9baf0a0055b8 100644
--- a/include/asm-generic/bitops/lock.h
+++ b/include/asm-generic/bitops/lock.h@@ -22,7 +22,7 @@ static inline int test_and_set_bit_lock(unsigned int nr, unsigned long mask = BIT_MASK(nr); p += BIT_WORD(nr); - if (READ_ONCE(*p) & mask) + if (*p & mask) return 1; old = atomic_long_fetch_or_acquire(mask, (atomic_long_t *)p);
@@ -60,7 +60,7 @@ static inline void __clear_bit_unlock(unsigned int nr, unsigned long old; p += BIT_WORD(nr); - old = READ_ONCE(*p); + old = *p; old &= ~BIT_MASK(nr); atomic_long_set_release((atomic_long_t *)p, old); }