Thread (30 messages) 30 messages, 8 authors, 2007-08-16

Re: [PATCH 6/24] make atomic_read() behave consistently on frv

From: Paul E. McKenney <hidden>
Date: 2007-08-14 17:02:05
Also in: lkml, netdev

On Tue, Aug 14, 2007 at 03:34:25PM +1000, Nick Piggin wrote:
Paul E. McKenney wrote:
quoted
On Mon, Aug 13, 2007 at 01:15:52PM +0800, Herbert Xu wrote:
quoted
Paul E. McKenney [off-list ref] wrote:
quoted
On Sat, Aug 11, 2007 at 08:54:46AM +0800, Herbert Xu wrote:
quoted
Chris Snook [off-list ref] wrote:
quoted
cpu_relax() contains a barrier, so it should do the right thing.  For 
non-smp architectures, I'm concerned about interacting with interrupt 
handlers.  Some drivers do use atomic_* operations.
What problems with interrupt handlers? Access to int/long must
be atomic or we're in big trouble anyway.
Reordering due to compiler optimizations.  CPU reordering does not
affect interactions with interrupt handlers on a given CPU, but
reordering due to compiler code-movement optimization does.  Since
volatile can in some cases suppress code-movement optimizations,
it can affect interactions with interrupt handlers.
If such reordering matters, then you should use one of the
*mb macros or barrier() rather than relying on possibly
hidden volatile cast.

If communicating among CPUs, sure.  However, when communicating between
mainline and interrupt/NMI handlers on the same CPU, the barrier() and
most expecially the *mb() macros are gross overkill.  So there really
truly is a place for volatile -- not a large place, to be sure, but a
place nonetheless.
I really would like all volatile users to go away and be replaced
by explicit barriers. It makes things nicer and more explicit... for
atomic_t type there probably aren't many optimisations that can be
made which volatile would disallow (in actual kernel code), but for
others (eg. bitops, maybe atomic ops in UP kernels), there would be.

Maybe it is the safe way to go, but it does obscure cases where there
is a real need for barriers.
I prefer burying barriers into other primitives.
Many atomic operations are allowed to be reordered between CPUs, so
I don't have a good idea for the rationale to order them within the
CPU (also loads and stores to long and ptr types are not ordered like
this, although we do consider those to be atomic operations too).

barrier() in a way is like enforcing sequential memory ordering
between process and interrupt context, wheras volatile is just
enforcing coherency of a single memory location (and as such is
cheaper).
barrier() is useful, but it has the very painful side-effect of forcing
the compiler to dump temporaries.  So we do need something that is
not quite so global in effect.
What do you think of this crazy idea?

/* Enforce a compiler barrier for only operations to location X.
 * Call multiple times to provide an ordering between multiple
 * memory locations. Other memory operations can be assumed by
 * the compiler to remain unchanged and may be reordered
 */
#define order(x) asm volatile("" : "+m" (x))
There was something very similar discussed earlier in this thread,
with quite a bit of debate as to exactly what the "m" flag should
look like.  I suggested something similar named ACCESS_ONCE in the
context of RCU (http://lkml.org/lkml/2007/7/11/664):

	#define ACCESS_ONCE(x) (*(volatile typeof(x) *)&(x))

The nice thing about this is that it works for both loads and stores.
Not clear that order() above does this -- I get compiler errors when
I try something like "b = order(a)" or "order(a) = 1" using gcc 4.1.2.

						Thanx, Paul
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help