Thread (103 messages) 103 messages, 20 authors, 2014-09-25

Re: bit fields && data tearing

From: Will Deacon <hidden>
Date: 2014-09-11 10:25:17
Also in: linux-arch, lkml

On Wed, Sep 10, 2014 at 10:48:06PM +0100, James Bottomley wrote:
On Tue, 2014-09-09 at 06:40 -0400, Peter Hurley wrote:
quoted
quoted
quoted
The processor is free to re-order this to:

	STORE C
	STORE B
	UNLOCK

That's because the unlock() only guarantees that:

Stores before the unlock in program order are guaranteed to complete
before the unlock completes. Stores after the unlock _may_ complete
before the unlock completes.

My point was that even if compiler barriers had the same semantics
as memory barriers, the situation would be no worse. That is, code
that is sensitive to memory barriers (like the example I gave above)
would merely have the same fragility with one-way compiler barriers
(with respect to the compiler combining writes).

That's what I meant by "no worse than would otherwise exist".
Actually, that's not correct.  This is actually deja vu with me on the
other side of the argument.  When we first did spinlocks on PA, I argued
as you did: lock only a barrier for code after and unlock for code
before.  The failing case is that you can have a critical section which
performs an atomically required operation and a following unit which
depends on it being performed.  If you begin the following unit before
the atomic requirement, you may end up losing.  It turns out this kind
of pattern is inherent in a lot of mail box device drivers: you need to
set up the mailbox atomically then poke it.  Setup is usually atomic,
deciding which mailbox to prime and actually poking it is in the
following unit.  Priming often involves an I/O bus transaction and if
you poke before priming, you get a misfire.
Take it up with the man because this was discussed extensively last
year and it was decided that unlocks would not be full barriers.
Thus the changes to memory-barriers.txt that explicitly note this
and the addition of smp_mb__after_unlock_lock() (for two different
locks; an unlock followed by a lock on the same lock is a full barrier).

Code that expects ordered writes after an unlock needs to explicitly
add the memory barrier.
I don't really care what ARM does; spin locks are full barriers on
architectures that need them.  The driver problem we had that detected
our semi permeable spinlocks was an LSI 53c875 which is enterprise class
PCI, so presumably not relevant to ARM anyway.
FWIW, unlock is always fully ordered against non-relaxed IO accesses. We
have pretty heavy barriers in readX/writeX to ensure this on ARM/arm64.

PPC do tricks in their unlock to avoid the overhead on each IO access.

Will
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help