[v3,11/41] mips: reuse asm-generic/barrier.h
From: Paul E. McKenney <hidden>
Date: 2016-01-15 17:54:30
Also in:
linux-arch, linux-mips, linux-s390, linux-sh, linux-um, linuxppc-dev, lkml, sparclinux, virtualization
On Fri, Jan 15, 2016 at 10:24:32AM +0000, Will Deacon wrote:
On Thu, Jan 14, 2016 at 02:55:10PM -0800, Paul E. McKenney wrote:quoted
On Thu, Jan 14, 2016 at 01:36:50PM -0800, Leonid Yegoshin wrote:quoted
On 01/14/2016 01:29 PM, Paul E. McKenney wrote:quoted
quoted
On 01/14/2016 12:34 PM, Paul E. McKenney wrote:quoted
The WRC+addr+addr is OK because data dependencies are not required to be transitive, in other words, they are not required to flow from one CPU to another without the help of an explicit memory barrier.I don't see any reliable way to fit WRC+addr+addr into "DATA DEPENDENCY BARRIERS" section recommendation to have data dependency barrier between read of a shared pointer/index and read the shared data based on that pointer. If you have this two reads, it doesn't matter the rest of scenario, you should put the dependency barrier in code anyway. If you don't do it in WRC+addr+addr scenario then after years it can be easily changed to different scenario which fits some of scenario in "DATA DEPENDENCY BARRIERS" section and fails.The trick is that lockless_dereference() contains an smp_read_barrier_depends(): #define lockless_dereference(p) \ ({ \ typeof(p) _________p1 = READ_ONCE(p); \ smp_read_barrier_depends(); /* Dependency order vs. p above. */ \ (_________p1); \ }) Or am I missing your point?WRC+addr+addr has no any barrier. lockless_dereference() has a barrier. I don't see a common points between this and that in your answer, sorry.Me, I am wondering what WRC+addr+addr has to do with anything at all.See my earlier reply [1] (but also, your WRC Linux example looks more like a variant on WWC and I couldn't really follow it).
I will revisit my WRC Linux example. And yes, creating litmus tests that use non-fake dependencies is still a bit of an undertaking. :-/ I am sure that it will seem more natural with time and experience...
quoted
<Going back through earlier email> OK, so it looks like Will was asking not about WRC+addr+addr, but instead about WRC+sync+addr. This would drop an smp_mb() into cpu2() in my earlier example, which needs to provide ordering. I am guessing that the manual's "Older instructions which must be globally performed when the SYNC instruction completes" provides the equivalent of ARM/Power A-cumulativity, which can be thought of as transitivity backwards in time.I couldn't make that leap. In particular, the manual's "Detailed Description" sections explicitly refer to program-order: Every synchronizable specified memory instruction (loads or stores or both) that occurs in the instruction stream before the SYNC instruction must reach a stage in the load/store datapath after which no instruction re-ordering is possible before any synchronizable specified memory instruction which occurs after the SYNC instruction in the instruction stream reaches the same stage in the load/store datapath. Will [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/399765.html
All good points. I think we all agree that the MIPS documentation could use significant help. And given that I work for the company that produced the analogous documentation for PowerPC, that is saying something. ;-) We simply can't know if MIPS's memory ordering is sufficient for the Linux kernel given its current implementation of the ordering primitives and its current documentation. I feel a bit better than I did earlier due to Leonid's response to my earlier litmus-test examples. But I do recommend some serious stress testing of MIPS on a good set of litmus tests. Much nicer finding issues that way than as random irreproducible strange behavior! Thanx, Paul