Re: bit fields && data tearing
From: Will Deacon <hidden>
Date: 2014-09-11 10:25:17
Also in:
linux-arch, lkml
On Wed, Sep 10, 2014 at 10:48:06PM +0100, James Bottomley wrote:
On Tue, 2014-09-09 at 06:40 -0400, Peter Hurley wrote:quoted
quoted
quoted
The processor is free to re-order this to: STORE C STORE B UNLOCK That's because the unlock() only guarantees that: Stores before the unlock in program order are guaranteed to complete before the unlock completes. Stores after the unlock _may_ complete before the unlock completes. My point was that even if compiler barriers had the same semantics as memory barriers, the situation would be no worse. That is, code that is sensitive to memory barriers (like the example I gave above) would merely have the same fragility with one-way compiler barriers (with respect to the compiler combining writes). That's what I meant by "no worse than would otherwise exist".Actually, that's not correct. This is actually deja vu with me on the other side of the argument. When we first did spinlocks on PA, I argued as you did: lock only a barrier for code after and unlock for code before. The failing case is that you can have a critical section which performs an atomically required operation and a following unit which depends on it being performed. If you begin the following unit before the atomic requirement, you may end up losing. It turns out this kind of pattern is inherent in a lot of mail box device drivers: you need to set up the mailbox atomically then poke it. Setup is usually atomic, deciding which mailbox to prime and actually poking it is in the following unit. Priming often involves an I/O bus transaction and if you poke before priming, you get a misfire.Take it up with the man because this was discussed extensively last year and it was decided that unlocks would not be full barriers. Thus the changes to memory-barriers.txt that explicitly note this and the addition of smp_mb__after_unlock_lock() (for two different locks; an unlock followed by a lock on the same lock is a full barrier). Code that expects ordered writes after an unlock needs to explicitly add the memory barrier.I don't really care what ARM does; spin locks are full barriers on architectures that need them. The driver problem we had that detected our semi permeable spinlocks was an LSI 53c875 which is enterprise class PCI, so presumably not relevant to ARM anyway.
FWIW, unlock is always fully ordered against non-relaxed IO accesses. We have pretty heavy barriers in readX/writeX to ensure this on ARM/arm64. PPC do tricks in their unlock to avoid the overhead on each IO access. Will