Thread (14 messages) 14 messages, 3 authors, 2019-02-05

Re: [PATCH 3/3] locking/qspinlock: Introduce starvation avoidance into CNA

From: Waiman Long <longman@redhat.com>
Date: 2019-02-05 13:48:33
Also in: linux-arch, lkml

On 02/05/2019 04:22 AM, Peter Zijlstra wrote:
On Mon, Feb 04, 2019 at 10:35:09PM -0500, Alex Kogan wrote:
quoted
quoted
On Jan 31, 2019, at 5:00 AM, Peter Zijlstra [off-list ref] wrote:

On Wed, Jan 30, 2019 at 10:01:35PM -0500, Alex Kogan wrote:
quoted
Choose the next lock holder among spinning threads running on the same
socket with high probability rather than always. With small probability,
hand the lock to the first thread in the secondary queue or, if that
queue is empty, to the immediate successor of the current lock holder
in the main queue.  Thus, assuming no failures while threads hold the
lock, every thread would be able to acquire the lock after a bounded
number of lock transitions, with high probability.

Note that we could make the inter-socket transition deterministic,
by sticking a counter of intra-socket transitions in the head node
of the secondary queue. At the handoff time, we could increment
the counter and check if it is below a threshold. This adds another
field to queue nodes and nearly-certain local cache miss to read and
update this counter during the handoff. While still beating stock,
this variant adds certain overhead over the probabilistic variant.
(also heavily suffers from the socket == node confusion)

How would you suggest RT 'tunes' this?

RT relies on FIFO fairness of the basic spinlock primitives; you just
completely wrecked that.
This is true that CNA trades some fairness for shorter lock handover
latency, much like any other NUMA-aware lock.

Can you explain, however, what exactly breaks here?
Timeliness guarantees. FIFO-fair has well defined time behaviour; you
know exactly how long you get to wait before you acquire the lock,
namely however many waiters are in front of you multiplied by the worst
case wait time.

Doing time analysis on a randomized algorithm isn't my idea of fun.
RT doesn't work well with NUMA qspinlock is another reason why I want it
to be a separate slow path. We will disable it  on a RT kernel where
guaranteed low latency is a must and throughput isn't as important.

Cheers,
Longman

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help