Thread (46 messages) 46 messages, 7 authors, 2020-07-25

Re: [PATCH v3 0/6] powerpc: queued spinlocks and rwlocks

From: Nicholas Piggin <npiggin@gmail.com>
Date: 2020-07-07 05:57:13
Also in: linux-arch, lkml, virtualization

Excerpts from Waiman Long's message of July 7, 2020 4:39 am:
On 7/6/20 12:35 AM, Nicholas Piggin wrote:
quoted
v3 is updated to use __pv_queued_spin_unlock, noticed by Waiman (thank you).

Thanks,
Nick

Nicholas Piggin (6):
   powerpc/powernv: must include hvcall.h to get PAPR defines
   powerpc/pseries: move some PAPR paravirt functions to their own file
   powerpc: move spinlock implementation to simple_spinlock
   powerpc/64s: implement queued spinlocks and rwlocks
   powerpc/pseries: implement paravirt qspinlocks for SPLPAR
   powerpc/qspinlock: optimised atomic_try_cmpxchg_lock that adds the
     lock hint

  arch/powerpc/Kconfig                          |  13 +
  arch/powerpc/include/asm/Kbuild               |   2 +
  arch/powerpc/include/asm/atomic.h             |  28 ++
  arch/powerpc/include/asm/paravirt.h           |  89 +++++
  arch/powerpc/include/asm/qspinlock.h          |  91 ++++++
  arch/powerpc/include/asm/qspinlock_paravirt.h |   7 +
  arch/powerpc/include/asm/simple_spinlock.h    | 292 +++++++++++++++++
  .../include/asm/simple_spinlock_types.h       |  21 ++
  arch/powerpc/include/asm/spinlock.h           | 308 +-----------------
  arch/powerpc/include/asm/spinlock_types.h     |  17 +-
  arch/powerpc/lib/Makefile                     |   3 +
  arch/powerpc/lib/locks.c                      |  12 +-
  arch/powerpc/platforms/powernv/pci-ioda-tce.c |   1 +
  arch/powerpc/platforms/pseries/Kconfig        |   5 +
  arch/powerpc/platforms/pseries/setup.c        |   6 +-
  include/asm-generic/qspinlock.h               |   4 +
  16 files changed, 577 insertions(+), 322 deletions(-)
  create mode 100644 arch/powerpc/include/asm/paravirt.h
  create mode 100644 arch/powerpc/include/asm/qspinlock.h
  create mode 100644 arch/powerpc/include/asm/qspinlock_paravirt.h
  create mode 100644 arch/powerpc/include/asm/simple_spinlock.h
  create mode 100644 arch/powerpc/include/asm/simple_spinlock_types.h
This patch looks OK to me.
Thanks for reviewing and testing.
I had run some microbenchmark on powerpc system with or w/o the patch.

On a 2-socket 160-thread SMT4 POWER9 system (not virtualized):

5.8.0-rc4
=========

Running locktest with spinlock [runtime = 10s, load = 1]
Threads = 160, Min/Mean/Max = 77,665/90,153/106,895
Threads = 160, Total Rate = 1,441,759 op/s; Percpu Rate = 9,011 op/s

Running locktest with rwlock [runtime = 10s, r% = 50%, load = 1]
Threads = 160, Min/Mean/Max = 47,879/53,807/63,689
Threads = 160, Total Rate = 860,192 op/s; Percpu Rate = 5,376 op/s

Running locktest with spinlock [runtime = 10s, load = 1]
Threads = 80, Min/Mean/Max = 242,907/319,514/463,161
Threads = 80, Total Rate = 2,555 kop/s; Percpu Rate = 32 kop/s

Running locktest with rwlock [runtime = 10s, r% = 50%, load = 1]
Threads = 80, Min/Mean/Max = 146,161/187,474/259,270
Threads = 80, Total Rate = 1,498 kop/s; Percpu Rate = 19 kop/s

Running locktest with spinlock [runtime = 10s, load = 1]
Threads = 40, Min/Mean/Max = 646,639/1,000,817/1,455,205
Threads = 40, Total Rate = 4,001 kop/s; Percpu Rate = 100 kop/s

Running locktest with rwlock [runtime = 10s, r% = 50%, load = 1]
Threads = 40, Min/Mean/Max = 402,165/597,132/814,555
Threads = 40, Total Rate = 2,388 kop/s; Percpu Rate = 60 kop/s

5.8.0-rc4-qlock+
================

Running locktest with spinlock [runtime = 10s, load = 1]
Threads = 160, Min/Mean/Max = 123,835/124,580/124,587
Threads = 160, Total Rate = 1,992 kop/s; Percpu Rate = 12 kop/s

Running locktest with rwlock [runtime = 10s, r% = 50%, load = 1]
Threads = 160, Min/Mean/Max = 254,210/264,714/276,784
Threads = 160, Total Rate = 4,231 kop/s; Percpu Rate = 26 kop/s

Running locktest with spinlock [runtime = 10s, load = 1]
Threads = 80, Min/Mean/Max = 599,715/603,397/603,450
Threads = 80, Total Rate = 4,825 kop/s; Percpu Rate = 60 kop/s

Running locktest with rwlock [runtime = 10s, r% = 50%, load = 1]
Threads = 80, Min/Mean/Max = 492,687/525,224/567,456
Threads = 80, Total Rate = 4,199 kop/s; Percpu Rate = 52 kop/s

Running locktest with spinlock [runtime = 10s, load = 1]
Threads = 40, Min/Mean/Max = 1,325,623/1,325,628/1,325,636
Threads = 40, Total Rate = 5,299 kop/s; Percpu Rate = 132 kop/s

Running locktest with rwlock [runtime = 10s, r% = 50%, load = 1]
Threads = 40, Min/Mean/Max = 1,249,731/1,292,977/1,342,815
Threads = 40, Total Rate = 5,168 kop/s; Percpu Rate = 129 kop/s

On systems on large number of cpus, qspinlock lock is faster and more fair.

With some tuning, we may be able to squeeze out more performance.
Yes, powerpc could certainly get more performance out of the slow
paths, and then there are a few parameters to tune.

We don't have a good alternate patching for function calls yet, but
that would be something to do for native vs pv.

And then there seem to be one or two tunable parameters we could
experiment with.

The paravirt locks may need a bit more tuning. Some simple testing
under KVM shows we might be a bit slower in some cases. Whether this
is fairness or something else I'm not sure. The current simple pv
spinlock code can do a directed yield to the lock holder CPU, whereas 
the pv qspl here just does a general yield. I think we might actually
be able to change that to also support directed yield. Though I'm
not sure if this is actually the cause of the slowdown yet.

Thanks,
Nick
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help