Thread (274 messages) 274 messages, 15 authors, 2014-03-07

Re: [RFC][PATCH 0/5] arch: atomic rework

From: Torvald Riegel <hidden>
Date: 2014-02-18 15:32:21
Also in: lkml

On Mon, 2014-02-17 at 16:05 -0800, Linus Torvalds wrote:
And exactly because I know enough, I would *really* like atomics to be
well-defined, and have very clear - and *local* - rules about how they
can be combined and optimized.
"Local"?
None of this "if you can prove that the read has value X" stuff. And
things like value speculation should simply not be allowed, because
that actually breaks the dependency chain that the CPU architects give
guarantees for. Instead, make the rules be very clear, and very
simple, like my suggestion. You can never remove a load because you
can "prove" it has some value, but you can combine two consecutive
atomic accesses/
Sorry, but the rules *are* very clear.  I *really* suggest to look at
the formalization by Batty et al.  And in these rules, proving that a
read will always return value X has a well-defined meaning, and you can
use it.  That simply follows from how the model is built.

What you seem to want just isn't covered by the model as it is today --
you can't infer from that that the model itself would be wrong.  The
dependency chains aren't modeled in the way you envision it (except in
what consume_mo tries, but that seems to be hard to implement); they are
there on the level of the program logic as modeled by the abstract
machine and the respective execution/output rules, but they are not
built to represent those specific ordering guarantees the HW gives you.

I would also be cautious claiming that the rules you suggested would be
very clear and very simple.  I haven't seen a memory model spec from you
that would be viable as the standard model for C/C++, nor have I seen
proof that this would actually be easier to understand for programmers
in general.
For example, CPU people actually do tend to give guarantees for
certain things, like stores that are causally related being visible in
a particular order.
Yes, but that's not part of the model so far.  If you want to exploit
this, please make a suggestion for how to extend the model to cover
this.

You certainly expect compilers to actually optimize code, which usually
removes or reorders stuff.  Now, we could say that we just don't want
that for atomic accesses, but even this isn't as clear-cut: How do we
actually detect where non-atomic code is intended by the programmer to
express (1) a dependency that needs to be transformed into a dependency
in the generated code from (2) a dependency that is just program logic?

IOW, consider the consume_mo example "*(p + flag - flag)", where flag is
coming from a consume_mo load: Can you give a complete set of rules (for
a full program) for when the compiler is allowed to optimize out
flag-flag and when not?  (And it should be practically implementable in
a compiler, and not prevent optimizations where people expect them.)
If the compiler starts doing value speculation on
atomic accesses, you are quite possibly breaking things like that.
It's just not a good idea. Don't do it. Write the standard so that it
clearly is disallowed.
You never want value speculation for anything possibly originating from
an atomic load?  Or what are the concrete rules you have in mind?
Because you may think that a C standard is machine-independent, but
that isn't really the case. The people who write code still write code
for a particular machine. Our code works (in the general case) on
different byte orderings, different register sizes, different memory
ordering models. But in each *instance* we still end up actually
coding for each machine.
That's how *you* do it.  I'm sure you are aware that for lots of
programmers, having a machine-dependent standard is not helpful at all.
As the standard is written, code is portable.  It can certainly be
beneficial for programmers to optimize for different machines, but the
core of the memory model will always remain portable, because that's
what arguably most programmers need.  That doesn't prevent
machine-dependent extensions, but those then should be well integrated
with the rest of the model.
So the rules for atomics should be simple and *specific* enough that
when you write code for a particular architecture, you can take the
architecture memory ordering *and* the C atomics orderings into
account, and do the right thing for that architecture.
That would be an extension, but not viable as a *general* requirement as
far as the standard is concerned.
And that very much means that doing things like value speculation MUST
NOT HAPPEN. See? Even if you can prove that your code is "equivalent",
it isn't.
I'm kind of puzzled why you keep stating generalizations of assertions
that clearly aren't well-founded (ie, which might be true for the kernel
or your wishes for how the world should be, but aren't true in general
or for the objectives of others).

It's not helpful to just pretend other people's opinions or worlds
didn't exist in discussions such as this one.  It's clear that the
standard has to consider all users of C/C++.  The kernel is a big user
of it, but also just that.

Why don't you just say that you do not *want* value speculation to
happen, because it does X and you'd like to exploit HW guarantee Y etc.?
That would be constructive, and actually correct in contrast to those
other broad claims.

(FWIW: Personally, I don't care whether you say something in a nice tone
or in some other way WITH BIG LETTERS.  I scan both for logically
consistent reasoning, and filter out the rest.  Therefore, avoiding
excessive amounts of the rest would make the discussion more efficient
for me, and I'd appreciate if everyone could work towards making it
efficient.  It's a tricky topic we're discussing, so I'd like to use my
brain for the actual technical bits and not for all the noise.)
So for example, let's say that you have a pointer, and you have some
reason to believe that the pointer has a particular value. So you
rewrite following the pointer from this:

  value = ptr->val;

into

  value = speculated->value;
  tmp = ptr;
  if (unlikely(tmp != speculated))
    value = tmp->value;

and maybe you can now make the critical code-path for the speculated
case go faster (since now there is no data dependency for the
speculated case, and the actual pointer chasing load is now no longer
in the critical path), and you made things faster because your
profiling showed that the speculated case was true 99% of the time.
Wonderful, right? And clearly, the code "provably" does the same
thing.
Please try to see that any proof is based on the language semantics as
specified.  Thus, you have to distinguish between (1) disagreeing with
the proof being correct and (2) disliking the language semantics.

I believe you mean the latter (let me know if you actually mean (1)); in
that case, it's the language semantics you want to be different, and we
need to discuss this.
EXCEPT THAT IS NOT TRUE AT ALL.

It very much does not do the same thing at all, and by doing value
speculation and "proving" something was true, the only thing you did
was to make incorrect code run faster. Because now the causally
related load of value from the pointer isn't actually causally related
at all, and you broke the memory ordering.
That's not how the language is specified, sorry.
This is why I don't like it when I see Torvald talk about "proving"
things. It's bullshit. You can "prove" pretty much anything, and in
the process lose sight of the bigger issue, namely that there is code
that depends on
Depends on what?

We really need to keep separate things separate here.  We started the
discussion with the topic of what behavior the memory model *as
specified today* allows.  In that model, the proofs are correct.

This model does not let programmers exploit the control dependencies and
such that you have in mind.  It does let programmers write correct code,
but this code might be somewhat less efficient (e.g., because it would
use memory_order_acquire instead of a weaker MO and control
dependencies).  If you want to discuss that, fine; we'd then need to see
how we can specify this properly and offer to programmers, and how to
extend the specification of the language semantics accordingly.

That Paul is looking through the uses of synchronization and how they
could (or could not) map to the memory model will be good input to this
discussion.
When it comes to atomic accesses, you don't play those kinds of games,
exactly because the ordering of the accesses matter in ways that are
not really sanely describable at a source code level.
They obviously need to be specifiable for a language that wants to
provide portable atomics.  For machine-dependent extensions, this could
take machine properties into accout, of course.  But even those need to
be modeled properly, at the source code level.  How else is a C/C++
compiler make sense out of it?
The *only* kinds
of games you play are like the ones I described - combining accesses
under very strict local rules.

And the strict local rules really should be of the type "a store
followed by a load to the same location with the same memory ordering
constraints can be combined". Never *ever* of the kind "if you can
prove X".
I doubt that's a better way to put it.  For example, you haven't defined
"followed" -- if you'd do that in a complete way, you'd inevitably come
to situations where something is checked for being provably correct.
What you do, in fact, is relying on the proof that the adjacent
store/load pair could always be executed atomically by the machine,
without preventing forward progress.
I hope my example made it clear *why* I react so strongly when Torvald
starts talking about "if you can prove the value is 1".
It's getting clearer to me, but as far as I can understand, from my
perspective, you're objecting to a straw man while not requesting the
thing you actually want.

It seems you're coming from a bottom-up perspective, seeing C code as
something defined like high-level assembler, which still has
machine-dependent parts and such.  The standard defines the language
semantics kind of the other way around, top-down from an abstract
machine, with memory_orders that at least to some extent allow one to
exploit different HW mechanisms with different costs for enforcing
ordering.

I can see why you have this perspective, but the standard needs the
top-down approach to suite all it's users.  I believe we should continue
discussing how we can bridge the gap between both views.  None of those
views is inherently wrong, they are just different.  No strong opinion
will change that, but I'm still hopeful that constructive discussion can
help bridge the gap.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help