Thread (155 messages) 155 messages, 13 authors, 2016-04-14

Re: [v3,11/41] mips: reuse asm-generic/barrier.h

From: Peter Zijlstra <peterz@infradead.org>
Date: 2016-01-12 21:40:24
Also in: linux-arm-kernel, linux-mips, linux-s390, linux-sh, linux-um, linuxppc-dev, lkml, sparclinux

On Tue, Jan 12, 2016 at 12:45:14PM -0800, Leonid Yegoshin wrote:
(I try to answer on multiple mails in one)

First of all, it seems like some generic notes should be given here:

1. Generic MIPS "SYNC" (aka "SYNC 0") instruction is a very heavy in some
CPUs. On that CPUs it basically kills pipelines in each CPU, can do a
special memory/IO bus transaction (similar to "fence") and hold a system
until all R/W is completed. It is like Big Kernel Lock but worse. So, the
move to SMP_* kind of barriers is needed to improve performance, especially
on newest CPUs with long pipelines.
The MIPS SYNC isn't any worse than the PPC SYNC, x86 MFENCE or arm DSB
SY, yes they're heavy, so what.
2. MIPS Arch document may be misleading because words "ordering" and
"completion" means different from Linux, the SYNC instruction description is
written for HW engineers. I wrote that in a separate patch of the same
patchset - http://patchwork.linux-mips.org/patch/10505/ "MIPS: R6: Use
lightweight SYNC instruction in smp_* memory barriers":
Did you actually say anything here?
quoted
This instructions were specifically designed to work for smp_*() sort of
memory barriers in MIPS R2/R3/R5 and R6.

Unfortunately, it's description is very cryptic and is done in HW engineering
style which prevents use of it by SW.
3. I bother MIPS Arch team long time until I completely understood that MIPS
SYNC_WMB, SYNC_MB, SYNC_RMB, SYNC_RELEASE and SYNC_ACQUIRE do an exactly
that is required in Documentation/memory-barriers.txt
Ha! and you think that document covers all the really fun details?

In particular we're very much all 'confused' about the various notions
of transitivity and what barriers imply how much of it.
In Peter Zijlstra mail:
quoted
1) you do not make such things selectable; either the hardware needs
them or it doesn't. If it does you_must_  use them, however unlikely.
It is selectable only for MIPS R2 but not MIPS R6. The reason is - most of
MIPS R2 CPUs have short pipeline and that SYNC is just waste of CPU
resource, especially taking into account that "lightweight syncs" are
converted to a heavy "SYNC 0" in many of that CPUs. However the latest
MIPS/Imagination CPU have a pipeline long enough to hit a problem - absence
of SYNC at LL/SC inside atomics, barriers etc.
What ?! Are you saying that because R2 has short pipelines its unlikely
to hit the reordering issues and we can omit barriers?
quoted
And reading the MIPS64 v6.04 instruction set manual, I think 0x11/0x12
are_NOT_  transitive and therefore cannot be used to implement the
smp_mb__{before,after} stuff.

That is, in MIPS speak, those SYNC types are Ordering Barriers, not
Completion Barriers.
Please see above, point 2.
That did not in fact enlighten things. Are they transitive/multi-copy
atomic or not?

(and here Will will go into great detail on the differences between the
two and make our collective brains explode :-)
quoted
That is, currently all architectures -- with exception of PPC -- have
RCsc locks, but using these non-transitive things will get you RCpc
locks.

So yes, MIPS can go RCpc for its locks and share the burden of pain with
PPC, but that needs to be a very concious decision.
I don't understand that - I tried hard but I can't find any word like
"RCsc", "RCpc" in Documents/ directory. Web search goes nowhere, of course.
From: lkml.kernel.org/r/20150828153921.GF19282@twins.programming.kicks-ass.net

Yes, the difference between RCpc and RCsc is in the meaning of RELEASE +
ACQUIRE. With RCsc that implies a full memory barrier, with RCpc it does
not.

Currently PowerPC is the only arch that (can, and) does RCpc and gives a
weaker RELEASE + ACQUIRE. Only the CPU who did the ACQUIRE is guaranteed
to see the stores of the CPU which did the RELEASE in order.

As it stands, RCU is the only _known_ codebase where this matters, but
we did in fact write code for a fair number of years 'assuming' RELEASE
+ ACQUIRE was a full barrier, so who knows what else is out there.


RCsc - release consistency sequential consistency
RCpc - release consistency processor consistency

https://en.wikipedia.org/wiki/Processor_consistency
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help