Thread (32 messages) 32 messages, 6 authors, 2015-10-21

Re: [PATCH v2] barriers: introduce smp_mb__release_acquire and update documentation

From: Paul E. McKenney <hidden>
Date: 2015-10-08 21:44:44
Also in: linux-arch, lkml

On Thu, Oct 08, 2015 at 01:16:38PM +0200, Peter Zijlstra wrote:
On Thu, Oct 08, 2015 at 02:50:36PM +1100, Michael Ellerman wrote:
quoted
On Wed, 2015-10-07 at 08:25 -0700, Paul E. McKenney wrote:
quoted
quoted
Currently, we do need smp_mb__after_unlock_lock() to be after the
acquisition on PPC -- putting it between the unlock and the lock
of course doesn't cut it for the cross-thread unlock/lock case.
This ^, that makes me think I don't understand
smp_mb__after_unlock_lock.

How is:

	UNLOCK x
	smp_mb__after_unlock_lock()
	LOCK y

a problem? That's still a full barrier.
The problem is that I need smp_mb__after_unlock_lock() to give me
transitivity even if the UNLOCK happened on one CPU and the LOCK
on another.  For that to work, the smp_mb__after_unlock_lock() needs
to be either immediately after the acquire (the current choice) or
immediately before the release (which would also work from a purely
technical viewpoint, but I much prefer the current choice).

Or am I missing your point?
quoted
quoted
I am with Peter -- we do need the benchmark results for PPC.
Urgh, sorry guys. I have been slowly doing some benchmarks, but time is not
plentiful at the moment.

If we do a straight lwsync -> sync conversion for unlock it looks like that
will cost us ~4.2% on Anton's standard context switch benchmark.
And that does not seem to agree with Paul's smp_mb__after_unlock_lock()
usage and would not be sufficient for the same (as of yet unexplained)
reason.

Why does it matter which of the LOCK or UNLOCK gets promoted to full
barrier on PPC in order to become RCsc?
You could do either.  However, as I understand it, there is hardware for
which bc;isync is faster than lwsync.  For such hardware, it is cheaper
to upgrade the unlock from lwsync to sync than to upgrade the lock from
bc;isync to sync.  If I recall correctly, the kernel rewrites itself at
boot to select whichever of lwsync or bc;isync is better for the hardware
at hand.

							Thanx, Paul
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help