Thread (274 messages) 274 messages, 15 authors, 2014-03-07

Re: [RFC][PATCH 0/5] arch: atomic rework

From: Paul E. McKenney <hidden>
Date: 2014-02-14 02:01:54
Also in: lkml

On Thu, Feb 13, 2014 at 12:03:57PM -0800, Torvald Riegel wrote:
On Wed, 2014-02-12 at 16:23 -0800, Paul E. McKenney wrote:
quoted
On Wed, Feb 12, 2014 at 12:22:53PM -0800, Linus Torvalds wrote:
quoted
On Wed, Feb 12, 2014 at 10:07 AM, Paul E. McKenney
[off-list ref] wrote:
quoted
Us Linux-kernel hackers will often need to use volatile semantics in
combination with C11 atomics in most cases.  The C11 atomics do cover
some of the reasons we currently use ACCESS_ONCE(), but not all of them --
in particular, it allows load/store merging.
I really disagree with the "will need to use volatile".

We should never need to use volatile (outside of whatever MMIO we do
using C) if C11 defines atomics correctly.

Allowing load/store merging is *fine*. All sane CPU's do that anyway -
it's called a cache - and there's no actual reason to think that
"ACCESS_ONCE()" has to mean our current "volatile".

Now, it's possible that the C standards simply get atomics _wrong_, so
that they create visible semantics that are different from what a CPU
cache already does, but that's a plain bug in the standard if so.

But merging loads and stores is fine. And I *guarantee* it is fine,
exactly because CPU's already do it, so claiming that the compiler
couldn't do it is just insanity.
Agreed, both CPUs and compilers can merge loads and stores.  But CPUs
normally get their stores pushed through the store buffer in reasonable
time, and CPUs also use things like invalidations to ensure that a
store is seen in reasonable time by readers.  Compilers don't always
have these two properties, so we do need to be more careful of load
and store merging by compilers.
The standard's _wording_ is a little vague about forward-progress
guarantees, but I believe the vast majority of the people involved do
want compilers to not prevent forward progress.  There is of course a
difference whether a compiler establishes _eventual_ forward progress in
the sense of after 10 years or forward progress in a small bounded
interval of time, but this is a QoI issue, and good compilers won't want
to introduce unnecessary latencies.  I believe that it is fine if the
standard merely talks about eventual forward progress.
The compiler will need to earn my trust on this one.  ;-)
quoted
quoted
Now, there are things that are *not* fine, like speculative stores
that could be visible to other threads. Those are *bugs* (either in
the compiler or in the standard), and anybody who claims otherwise is
not worth discussing with.
And as near as I can tell, volatile semantics are required in C11 to
avoid speculative stores.  I might be wrong about this, and hope that
I am wrong.  But I am currently not seeing it in the current standard.
(Though I expect that most compilers would avoid speculating stores,
especially in the near term.
This really depends on how we define speculative stores.  The memory
model is absolutely clear that programs have to behave as if executed by
the virtual machine, and that rules out speculative stores to volatiles
and other locations.  Under certain circumstances, there will be
"speculative" stores in the sense that they will happen at different
times as if you had a trivial implementation of the abstract machine.
But to be allowed to do that, the compiler has to prove that such a
transformation still fulfills the as-if rule.
Agreed, although the as-if rule would ignore control dependencies, since
these are not yet part of the standard (as you in fact note below).
I nevertheless consider myself at least somewhat reassured that current
C11 won't speculate stores.  My remaining concerns involve the compiler
proving to itself that a given branch is always taken, thus motivating
it to optimize the branch away -- though this is more properly a
control-dependency concern.
IOW, the abstract machine is what currently defines disallowed
speculative stores.  If you want to put *further* constraints on what
implementations are allowed to do, I suppose it is best to talk about
those and see how we can add rules that allow programmers to express
those constraints.  For example, control dependencies might be such a
case.  I don't have a specific suggestion -- maybe the control
dependencies are best tackled similar to consume dependencies (even
though we don't have a good solution for those yets).  But using
volatile accesses for that seems to be a big hammer, or even the wrong
one.
In current compilers, the two hammers we have are volatile and barrier().
But yes, it would be good to have something more focused.  One option
would be to propose memory_order_control loads to see how loudly the
committee screams.  One use case might be as follows:

	if (atomic_load(x, memory_order_control))
		atomic_store(y, memory_order_relaxed);

This could also be written:

	r1 = atomic_load(x, memory_order_control);
	if (r1)
		atomic_store(y, memory_order_relaxed);

A branch depending on the memory_order_control load could not be optimized
out, though I suppose that the compiler could substitute a memory-barrier
instruction for the branch.  Seems like it would take a very large number
of branches to equal the overhead of the memory barrier, though.

Another option would be to flag the conditional expression, prohibiting
the compiler from optimizing out any conditional branches.  Perhaps
something like this:

	r1 = atomic_load(x, memory_order_control);
	if (control_dependency(r1))
		atomic_store(y, memory_order_relaxed);

Other thoughts?

							Thanx, Paul
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help