Re: perf events ring buffer memory barrier on powerpc
From: Paul E. McKenney <hidden>
Date: 2013-11-04 10:53:01
On Mon, Nov 04, 2013 at 09:57:17AM +0000, Will Deacon wrote:
Hi Paul, On Sun, Nov 03, 2013 at 10:47:12PM +0000, Paul E. McKenney wrote:quoted
On Sun, Nov 03, 2013 at 05:07:59PM +0000, Will Deacon wrote:quoted
On Sun, Nov 03, 2013 at 02:40:17PM +0000, Paul E. McKenney wrote:quoted
On Sat, Nov 02, 2013 at 10:32:39AM -0700, Paul E. McKenney wrote:quoted
On Fri, Nov 01, 2013 at 03:56:34PM +0100, Peter Zijlstra wrote:quoted
On Wed, Oct 30, 2013 at 11:40:15PM -0700, Paul E. McKenney wrote:quoted
quoted
Now the whole crux of the question is if we need barrier A at all, since the STORES issued by the @buf writes are dependent on the ubuf->tail read.The dependency you are talking about is via the "if" statement? Even C/C++11 is not required to respect control dependencies. This one is a bit annoying. The x86 TSO means that you really only need barrier(), ARM (recent ARM, anyway) and Power could use a weaker barrier, and so on -- but smp_mb() emits a full barrier. Perhaps a new smp_tmb() for TSO semantics, where reads are ordered before reads, writes before writes, and reads before writes, but not writes before reads? Another approach would be to define a per-arch barrier for this particular case.I suppose we can only introduce new barrier primitives if there's more than 1 use-case.Which barrier did you have in mind when you refer to `recent ARM' above? It seems to me like you'd need a combination if dmb ishld and dmb ishst, since the former doesn't order writes before writes.I heard a rumor that ARM had recently added a new dmb variant that acted similarly to PowerPC's lwsync, and it was on my list to follow up. Given your response, I am guessing that there is no truth to this rumor...I think you're talking about the -ld option to dmb, which was introduced in ARMv8. That option orders loads against loads and stores, but doesn't order writes against writes. So you could do: dmb ishld dmb ishst but it's questionable whether that performs better than a dmb ish.
If Linus's smp_store_with_release_semantics() approach works out, ARM should be able to use its shiny new ldar and stlr instructions. Thanx, Paul