Re: [PATCH v2] barriers: introduce smp_mb__release_acquire and update documentation
From: Paul E. McKenney <hidden>
Date: 2015-10-08 21:44:44
Also in:
linux-arch, lkml
On Thu, Oct 08, 2015 at 01:16:38PM +0200, Peter Zijlstra wrote:
On Thu, Oct 08, 2015 at 02:50:36PM +1100, Michael Ellerman wrote:quoted
On Wed, 2015-10-07 at 08:25 -0700, Paul E. McKenney wrote:quoted
quoted
Currently, we do need smp_mb__after_unlock_lock() to be after the acquisition on PPC -- putting it between the unlock and the lock of course doesn't cut it for the cross-thread unlock/lock case.This ^, that makes me think I don't understand smp_mb__after_unlock_lock. How is: UNLOCK x smp_mb__after_unlock_lock() LOCK y a problem? That's still a full barrier.
The problem is that I need smp_mb__after_unlock_lock() to give me transitivity even if the UNLOCK happened on one CPU and the LOCK on another. For that to work, the smp_mb__after_unlock_lock() needs to be either immediately after the acquire (the current choice) or immediately before the release (which would also work from a purely technical viewpoint, but I much prefer the current choice). Or am I missing your point?
quoted
quoted
I am with Peter -- we do need the benchmark results for PPC.Urgh, sorry guys. I have been slowly doing some benchmarks, but time is not plentiful at the moment. If we do a straight lwsync -> sync conversion for unlock it looks like that will cost us ~4.2% on Anton's standard context switch benchmark.And that does not seem to agree with Paul's smp_mb__after_unlock_lock() usage and would not be sufficient for the same (as of yet unexplained) reason. Why does it matter which of the LOCK or UNLOCK gets promoted to full barrier on PPC in order to become RCsc?
You could do either. However, as I understand it, there is hardware for which bc;isync is faster than lwsync. For such hardware, it is cheaper to upgrade the unlock from lwsync to sync than to upgrade the lock from bc;isync to sync. If I recall correctly, the kernel rewrites itself at boot to select whichever of lwsync or bc;isync is better for the hardware at hand. Thanx, Paul