Thread (29 messages) 29 messages, 5 authors, 2015-10-14

Re: [PATCH v3 6/6] powerpc: atomic: Implement cmpxchg{,64}_* and atomic{,64}_cmpxchg_* variants

From: Will Deacon <hidden>
Date: 2015-10-13 14:43:36
Also in: lkml

On Tue, Oct 13, 2015 at 10:32:59PM +0800, Boqun Feng wrote:
On Tue, Oct 13, 2015 at 02:24:04PM +0100, Will Deacon wrote:
quoted
On Mon, Oct 12, 2015 at 10:14:06PM +0800, Boqun Feng wrote:
quoted
Implement cmpxchg{,64}_relaxed and atomic{,64}_cmpxchg_relaxed, based on
which _release variants can be built.

To avoid superfluous barriers in _acquire variants, we implement these
operations with assembly code rather use __atomic_op_acquire() to build
them automatically.
The "superfluous barriers" are for the case where the cmpxchg fails, right?
Yes.
quoted
And you don't do the same thing for release, because you want to avoid a
barrier in the middle of the critical section?
Mostly because of the comments in include/linux/atomic.h:

 * For compound atomics performing both a load and a store, ACQUIRE
 * semantics apply only to the load and RELEASE semantics only to the
 * store portion of the operation. Note that a failed cmpxchg_acquire
 * does -not- imply any memory ordering constraints.

so I thought only the barrier in cmpxchg_acquire() is conditional, and
the barrier in cmpxchg_release() is not. Maybe we'd better call it out
that cmpxchg *family* doesn't have any order guarantee if cmp fails, as
a complement of

ed2de9f74ecb ("locking/Documentation: Clarify failed cmpxchg() memory ordering semantics")

Because it seems this commit only claims that the barriers in fully
ordered version are conditional.
I didn't think this was ambiguous... A failed cmpxchg_release doesn't
perform a store, so because the RELEASE semantics only apply to the
store portion of the operation, it therefore doesn't have any ordering
guarantees. Acquire is called out as a special case because it *does*
actually perform a load on the failure case.
If cmpxchg_release doesn't have order guarantee when failed, I guess I
can implement it with a barrier in the middle as you mentioned:

	unsigned int prev;

	__asm__ __volatile__ (
"1:	lwarx	%0,0,%2		
	cmpw	0,%0,%3\n\
	bne-	2f\n"
	PPC_RELEASE_BARRIER
"	stwcx.	%4,0,%2\n\
	bne-	1b"
	"\n\
2:"
	: "=&r" (prev), "+m" (*p)
	: "r" (p), "r" (old), "r" (new)
	: "cc", "memory");

	return prev;


However, I need to check whether the architecture allows this and any
other problem exists.

Besides, I don't think it's a good idea to do the "put barrier in the
middle" thing in this patchset, because that seems a premature
optimization and if we go further, I guess we can also replace the
PPC_RELEASE_BARRIER above with a "sync" to implement a fully ordered
version cmpxchg(). Too much needs to investigate then..
Putting a barrier in the middle of that critical section is probably a
terrible idea, and that's why I thought you were avoiding it (hence my
original question). Perhaps just add a comment to that effect, since I
fear adding more words to memory-barriers.txt is just likely to create
further confusion.

Will
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help