Thread (43 messages) 43 messages, 6 authors, 2015-10-26

Re: [PATCH tip/locking/core v4 1/6] powerpc: atomic: Make *xchg and *cmpxchg a full barrier

From: Boqun Feng <hidden>
Date: 2015-10-15 04:48:27
Also in: lkml, stable

On Wed, Oct 14, 2015 at 08:07:05PM -0700, Paul E. McKenney wrote:
On Thu, Oct 15, 2015 at 08:53:21AM +0800, Boqun Feng wrote:
[snip]
quoted
I'm afraid more than that, the above litmus also shows that

	CPU 0				CPU 1
	-----				-----

	WRITE_ONCE(x, 1);		WRITE_ONCE(a, 2);
	r3 = xchg_release(&a, 1);	smp_mb();
					r3 = READ_ONCE(x);

	(0:r3 == 0 && 1:r3 == 0 && a == 2) is not prohibitted

in the implementation of this patchset, which should be disallowed by
the semantics of RELEASE, right?
Not necessarily.  If you had the read first on CPU 1, and you had a
similar problem, I would be more worried.
Sometimes I think maybe we should say that a single unpaired ACQUIRE or
RELEASE doesn't have any order guarantee because of the above case.

But seems that's not a normal or even existing case, my bad ;-(
quoted
And even:

	CPU 0				CPU 1
	-----				-----

	WRITE_ONCE(x, 1);		WRITE_ONCE(a, 2);
	smp_store_release(&a, 1);	smp_mb();
					r3 = READ_ONCE(x);

	(1:r3 == 0 && a == 2) is not prohibitted

shows by:

	PPC weird-lwsync
	""
	{
	0:r1=1; 0:r2=x; 0:r3=3; 0:r12=a;
	1:r1=2; 1:r2=x; 1:r3=3; 1:r12=a;
	}
	 P0                 | P1                 ;
	 stw r1,0(r2)       | stw r1,0(r12)      ;
	 lwsync             | sync               ;
	 stw  r1,0(r12)	    | lwz r3,0(r2)       ;
	exists
	(a=2 /\ 1:r3=0)

Please find something I'm (or the tool is) missing, maybe we can't use
(a == 2) as a indication that STORE on CPU 1 happens after STORE on CPU
0?
Again, if you were pairing the smp_store_release() with an smp_load_acquire()
or even a READ_ONCE() followed by a barrier, I would be quite concerned.
I am not at all worried about the above two litmus tests.
Understood, thank you for think through that ;-)
quoted
And there is really something I find strange, see below.
quoted
quoted
So the scenario that would fail would be this one, right?

a = x = 0

	CPU0				CPU1

	r3 = load_locked (&a);
					a = 2;
					sync();
					r3 = x;
	x = 1;
	lwsync();
	if (!store_cond(&a, 1))
		goto again


Where we hoist the load way up because lwsync allows this.
That scenario would end up with a==1 rather than a==2.
quoted
I always thought this would fail because CPU1's store to @a would fail
the store_cond() on CPU0 and we'd do the 'again' thing, re-issuing the
load and now seeing the new value (2).
The stwcx. failure was one thing that prevented a number of other
misordering cases.  The problem is that we have to let go of the notion
of an implicit global clock.

To that end, the herd tool can make a diagram of what it thought
happened, and I have attached it.  I used this diagram to try and force
this scenario at https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html#PPC,
and succeeded.  Here is the sequence of events:

o	Commit P0's write.  The model offers to propagate this write
	to the coherence point and to P1, but don't do so yet.

o	Commit P1's write.  Similar offers, but don't take them up yet.

o	Commit P0's lwsync.

o	Execute P0's lwarx, which reads a=0.  Then commit it.

o	Commit P0's stwcx. as successful.  This stores a=1.

o	Commit P0's branch (not taken).
So at this point, P0's write to 'a' has propagated to P1, right? But
P0's write to 'x' hasn't, even there is a lwsync between them, right?
Doesn't the lwsync prevent this from happening?
No, because lwsync is quite a bit weaker than sync aside from just
the store-load ordering.
Understood, I've tried the ppcmem, much clear now ;-)
quoted
If at this point P0's write to 'a' hasn't propagated then when?
Later.  At the very end of the test, in this case.  ;-)
Hmm.. I tried exactly this sequence in ppcmem, seems propagation of P0's
write to 'a' is never an option...
Why not try creating a longer litmus test that requires P0's write to
"a" to propagate to P1 before both processes complete?
I will try to write one, but to be clear, you mean we still observe 

0:r3 == 0 && a == 2 && 1:r3 == 0 

at the end, right? Because I understand that if P1's write to 'a'
doesn't override P0's, P0's write to 'a' will propagate.

Regards,
Boqun

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help