Re: [PATCH v9 3/5] locking/qspinlock: Introduce CNA into the slow path of qspinlock | linux-arm-kernel

quoted

On Thu, Jan 23, 2020 at 11:16:49AM +0100, Peter Zijlstra wrote:
On Thu, Jan 23, 2020 at 11:06:35AM +0100, Peter Zijlstra wrote:
On Thu, Jan 23, 2020 at 10:26:58AM +0100, Peter Zijlstra wrote:
On Tue, Jan 14, 2020 at 10:59:18PM -0500, Alex Kogan wrote:
+/* this function is called only when the primary queue is empty */
+static inline bool cna_try_change_tail(struct qspinlock *lock, u32 val,
+				       struct mcs_spinlock *node)
+{
+	struct mcs_spinlock *head_2nd, *tail_2nd;
+	u32 new;
+
+	/* If the secondary queue is empty, do what MCS does. */
+	if (node->locked <= 1)
+		return __try_clear_tail(lock, val, node);
+
+	/*
+	 * Try to update the tail value to the last node in the secondary queue.
+	 * If successful, pass the lock to the first thread in the secondary
+	 * queue. Doing those two actions effectively moves all nodes from the
+	 * secondary queue into the main one.
+	 */
+	tail_2nd = decode_tail(node->locked);
+	head_2nd = tail_2nd->next;
+	new = ((struct cna_node *)tail_2nd)->encoded_tail + _Q_LOCKED_VAL;
+
+	if (atomic_try_cmpxchg_relaxed(&lock->val, &val, new)) {
+		/*
+		 * Try to reset @next in tail_2nd to NULL, but no need to check
+		 * the result - if failed, a new successor has updated it.
+		 */
I think you actually have an ordering bug here; the load of head_2nd
*must* happen before the atomic_try_cmpxchg(), otherwise it might
observe the new next and clear a valid next pointer.

What would be the best fix for that; I'm thinking:

	head_2nd = smp_load_acquire(&tail_2nd->next);

Will?
Hmm, given we've not passed the lock around yet; why wouldn't something
like this work:

	smp_store_release(&tail_2nd->next, NULL);
Argh, make that:

	tail_2nd->next = NULL;

	smp_wmb();

	if (!atomic_try_cmpxchg_relaxed(&lock, &val, new)) {
... or could you drop the smp_wmb() and make this
atomic_try_cmpxchg_release()?

To be honest, I've failed to understand the code prior to your changes
in this area: it appears to reply on a control-dependency from the two
cmpxchg_relaxed() calls (which isn't sufficient to order the store parts
afaict) and I also don't get how we deal with a transiently circular primary
queue.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help