Re: bit fields && data tearing
From: Paul E. McKenney <hidden>
Date: 2014-09-07 23:00:30
Also in:
linux-arch, lkml
On Sun, Sep 07, 2014 at 12:04:47PM -0700, James Bottomley wrote:
On Sun, 2014-09-07 at 09:21 -0700, Paul E. McKenney wrote:quoted
On Sat, Sep 06, 2014 at 10:07:22PM -0700, James Bottomley wrote:quoted
On Thu, 2014-09-04 at 21:06 -0700, Paul E. McKenney wrote:quoted
On Thu, Sep 04, 2014 at 10:47:24PM -0400, Peter Hurley wrote:quoted
Hi James, On 09/04/2014 10:11 PM, James Bottomley wrote:quoted
On Thu, 2014-09-04 at 17:17 -0700, Paul E. McKenney wrote:quoted
+And there are anti-guarantees: + + (*) These guarantees do not apply to bitfields, because compilers often + generate code to modify these using non-atomic read-modify-write + sequences. Do not attempt to use bitfields to synchronize parallel + algorithms. + + (*) Even in cases where bitfields are protected by locks, all fields + in a given bitfield must be protected by one lock. If two fields + in a given bitfield are protected by different locks, the compiler's + non-atomic read-modify-write sequences can cause an update to one + field to corrupt the value of an adjacent field. + + (*) These guarantees apply only to properly aligned and sized scalar + variables. "Properly sized" currently means "int" and "long", + because some CPU families do not support loads and stores of + other sizes. ("Some CPU families" is currently believed to + be only Alpha 21064. If this is actually the case, a different + non-guarantee is likely to be formulated.)This is a bit unclear. Presumably you're talking about definiteness of the outcome (as in what's seen after multiple stores to the same variable).No, the last conditions refers to adjacent byte stores from different cpu contexts (either interrupt or SMP).quoted
The guarantees are only for natural width on Parisc as well, so you would get a mess if you did byte stores to adjacent memory locations.For a simple test like: struct x { long a; char b; char c; char d; char e; }; void store_bc(struct x *p) { p->b = 1; p->c = 2; } on parisc, gcc generates separate byte stores void store_bc(struct x *p) { 0: 34 1c 00 02 ldi 1,ret0 4: 0f 5c 12 08 stb ret0,4(r26) 8: 34 1c 00 04 ldi 2,ret0 c: e8 40 c0 00 bv r0(rp) 10: 0f 5c 12 0a stb ret0,5(r26) which appears to confirm that on parisc adjacent byte data is safe from corruption by concurrent cpu updates; that is, CPU 0 | CPU 1 | p->b = 1 | p->c = 2 | will result in p->b == 1 && p->c == 2 (assume both values were 0 before the call to store_bc()).What Peter said. I would ask for suggestions for better wording, but I would much rather be able to say that single-byte reads and writes are atomic and that aligned-short reads and writes are also atomic. Thus far, it looks like we lose only very old Alpha systems, so unless I hear otherwise, I update my patch to outlaw these very old systems.This isn't universally true according to the architecture manual. The PARISC CPU can make byte to long word stores atomic against the memory bus but not against the I/O bus for instance. Atomicity is a property of the underlying substrate, not of the CPU. Implying that atomicity is a CPU property is incorrect.OK, fair point. But are there in-use-for-Linux PARISC memory fabrics (for normal memory, not I/O) that do not support single-byte and double-byte stores?For aligned access, I believe that's always the case for the memory bus (on both 32 and 64 bit systems). However, it only applies to machine instruction loads and stores of the same width.. If you mix the widths on the loads and stores, all bets are off. That means you have to beware of the gcc penchant for coalescing loads and stores: if it sees two adjacent byte stores it can coalesce them into a short store instead ... that screws up the atomicity guarantees.
OK, that means that to make PARISC work reliably, we need to use
ACCESS_ONCE() for loads and stores that could have racing accesses.
If I understand correctly, this will -not- be needed for code guarded
by locks, even with Peter's examples.
So if we have something like this:
struct foo {
char a;
char b;
};
struct foo *fp;
then this code would be bad:
fp->a = 1;
fp->b = 2;
The reason is (as you say) that GCC would be happy to store 0x0102
(or vice versa, depending on endianness) to the pair. We instead
need:
ACCESS_ONCE(fp->a) = 1;
ACCESS_ONCE(fp->b) = 2;
However, if the code is protected by locks, no problem:
struct foo {
spinlock_t lock_a;
spinlock_t lock_b;
char a;
char b;
};
Then it is OK to do the following:
spin_lock(fp->lock_a);
fp->a = 1;
spin_unlock(fp->lock_a);
spin_lock(fp->lock_b);
fp->b = 1;
spin_unlock(fp->lock_b);
Or even this, assuming ->lock_a precedes ->lock_b in the locking hierarchy:
spin_lock(fp->lock_a);
spin_lock(fp->lock_b);
fp->a = 1;
fp->b = 1;
spin_unlock(fp->lock_a);
spin_unlock(fp->lock_b);
Here gcc might merge the assignments to fp->a and fp->b, but that is OK
because both locks are held, presumably preventing other assignments or
references to fp->a and fp->b.
On the other hand, if either fp->a or fp->b are referenced outside of their
respective locks, even once, then this last code fragment would still need
ACCESS_ONCE() as follows:
spin_lock(fp->lock_a);
spin_lock(fp->lock_b);
ACCESS_ONCE(fp->a) = 1;
ACCESS_ONCE(fp->b) = 1;
spin_unlock(fp->lock_a);
spin_unlock(fp->lock_b);
Does that cover it? If so, I will update memory-barriers.txt accordingly.
Thanx, Paul