[PATCH 2/9] locking/qrwlock: avoid redundant atomic_add_return on read_lock_slowpath

[PATCH 0/9] locking/qrwlock: get qrwlocks up and running on arm64 · Will Deacon <hidden> · 2015-07-07
[PATCH 1/9] locking/qrwlock: include <linux/spinlock.h> for arch_spin_{lock, unlock} · Will Deacon <hidden> · 2015-07-07
[PATCH 2/9] locking/qrwlock: avoid redundant atomic_add_return on read_lock_slowpath · Will Deacon <hidden> · 2015-07-07
[PATCH 2/9] locking/qrwlock: avoid redundant atomic_add_return on read_lock_slowpath · Waiman Long <hidden> · 2015-07-07
[PATCH 2/9] locking/qrwlock: avoid redundant atomic_add_return on read_lock_slowpath · Will Deacon <hidden> · 2015-07-07
[PATCH 2/9] locking/qrwlock: avoid redundant atomic_add_return on read_lock_slowpath · Waiman Long <hidden> · 2015-07-07
[PATCH 2/9] locking/qrwlock: avoid redundant atomic_add_return on read_lock_slowpath · peterz@infradead.org (Peter Zijlstra) · 2015-07-08
[PATCH 2/9] locking/qrwlock: avoid redundant atomic_add_return on read_lock_slowpath · Will Deacon <hidden> · 2015-07-08
[PATCH 2/9] locking/qrwlock: avoid redundant atomic_add_return on read_lock_slowpath · peterz@infradead.org (Peter Zijlstra) · 2015-07-07
[PATCH 3/9] locking/qrwlock: tidy up rspin_until_writer_unlock · Will Deacon <hidden> · 2015-07-07
[PATCH 4/9] locking/qrwlock: implement queue_write_unlock using smp_store_release · Will Deacon <hidden> · 2015-07-07
[PATCH 4/9] locking/qrwlock: implement queue_write_unlock using smp_store_release · peterz@infradead.org (Peter Zijlstra) · 2015-07-08
[PATCH 5/9] locking/qrwlock: remove redundant cmpxchg barriers on writer slow-path · Will Deacon <hidden> · 2015-07-07
[PATCH 5/9] locking/qrwlock: remove redundant cmpxchg barriers on writer slow-path · peterz@infradead.org (Peter Zijlstra) · 2015-07-08
[PATCH 5/9] locking/qrwlock: remove redundant cmpxchg barriers on writer slow-path · Will Deacon <hidden> · 2015-07-08
[PATCH 6/9] locking/qrwlock: allow architectures to hook in to contended paths · Will Deacon <hidden> · 2015-07-07
[PATCH 6/9] locking/qrwlock: allow architectures to hook in to contended paths · peterz@infradead.org (Peter Zijlstra) · 2015-07-08
[PATCH 6/9] locking/qrwlock: allow architectures to hook in to contended paths · Will Deacon <hidden> · 2015-07-08
[PATCH 7/9] locking/qrwlock: expose internal lock structure in qrwlock definition · Will Deacon <hidden> · 2015-07-07
[PATCH 8/9] arm64: cmpxchg: implement cmpxchg_relaxed · Will Deacon <hidden> · 2015-07-07
[PATCH 9/9] arm64: locking: replace read/write locks with generic qrwlock code · Will Deacon <hidden> · 2015-07-07

STALE3991d

From: Waiman Long <hidden>
Date: 2015-07-07 19:28:13
Also in: linux-arch

On 07/07/2015 02:19 PM, Will Deacon wrote:

On Tue, Jul 07, 2015 at 06:51:54PM +0100, Waiman Long wrote:

quoted

On 07/07/2015 01:24 PM, Will Deacon wrote:

quoted

When a slow-path reader gets to the front of the wait queue outside of
interrupt context, it waits for any writers to drain, increments the
reader count and again waits for any additional writers that may have
snuck in between the initial check and the increment.

Given that this second check is performed with acquire semantics, there
is no need to perform the increment using atomic_add_return, which acts
as a full barrier.

This patch changes the slow-path code to use smp_load_acquire and
atomic_add instead of atomic_add_return. Since the check only involves
the writer count, we can perform the acquire after the add.

Signed-off-by: Will Deacon<redacted>
---
   kernel/locking/qrwlock.c | 3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/locking/qrwlock.c b/kernel/locking/qrwlock.c
index 96b77d1e0545..4e29bef688ac 100644
--- a/kernel/locking/qrwlock.c
+++ b/kernel/locking/qrwlock.c

@@ -98,7 +98,8 @@ void queued_read_lock_slowpath(struct qrwlock *lock, u32 cnts)
   	while (atomic_read(&lock->cnts)&   _QW_WMASK)
   		cpu_relax_lowlatency();

-	cnts = atomic_add_return(_QR_BIAS,&lock->cnts) - _QR_BIAS;
+	atomic_add(_QR_BIAS,&lock->cnts);
+	cnts = smp_load_acquire((u32 *)&lock->cnts);
   	rspin_until_writer_unlock(lock, cnts);

   	/*

Atomic add in x86 is actually a full barrier too. The performance
difference between "lock add" and "lock xadd" should be minor. The
additional load, however, could potentially cause an additional
cacheline load on a contended lock. So do you see actual performance
benefit of this change in ARM?

I'd need to re-run the numbers, but atomic_add is significantly less
work on ARM than atomic_add_return, which basically has two full memory
barriers compared to none for the former.

Will

I think a compromise is to encapsulate that in an inlined function that 
can be overridden by architecture specific code. For example,

#ifndef queued_inc_reader_return
static inline u32 queued_inc_reader_return(struct qrwlock *lock)
{
     return atomic_add_return(_QR_BIAS, &lock->cnts) - _QR_BIAS;
}
#endif

:

cnts = queued_inc_reader_return(lock);

This also means that you will need to keep the function prototype for 
rspin_until_writer_unlock() in the later patch. Other than that, I don't 
see any other issue in the patch series.

BTW, are you also planning to use qspinlock in the ARM64 code?

Cheers,
Longman

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help