Thread (103 messages) 103 messages, 20 authors, 2014-09-25

Re: bit fields && data tearing

From: James Bottomley <James.Bottomley@HansenPartnership.com>
Date: 2014-09-10 21:48:13
Also in: linux-arch, lkml

On Tue, 2014-09-09 at 06:40 -0400, Peter Hurley wrote:
On 09/08/2014 10:56 PM, James Bottomley wrote:
quoted
On Mon, 2014-09-08 at 19:30 -0400, Peter Hurley wrote:
quoted
On 09/08/2014 01:50 AM, James Bottomley wrote:
quoted
quoted
But additionally, even if gcc combines adjacent writes _that are part
of the program flow_ then I believe the situation is no worse than
would otherwise exist.

For instance, given the following:

struct x {
	spinlock_t lock;
	long a;
	byte b;
	byte c;
};

void locked_store_b(struct x *p)
{
	spin_lock(&p->lock);
	p->b = 1;
	spin_unlock(&p->lock);
	p->c = 2;
}

Granted, the author probably expects ordered writes of
	STORE B
	STORE C
but that's not guaranteed because there is no memory barrier
ordering B before C.
Yes, there is: loads and stores may not migrate into or out of critical
sections.
That's a common misconception.

The processor is free to re-order this to:

	STORE C
	STORE B
	UNLOCK

That's because the unlock() only guarantees that:

Stores before the unlock in program order are guaranteed to complete
before the unlock completes. Stores after the unlock _may_ complete
before the unlock completes.

My point was that even if compiler barriers had the same semantics
as memory barriers, the situation would be no worse. That is, code
that is sensitive to memory barriers (like the example I gave above)
would merely have the same fragility with one-way compiler barriers
(with respect to the compiler combining writes).

That's what I meant by "no worse than would otherwise exist".
Actually, that's not correct.  This is actually deja vu with me on the
other side of the argument.  When we first did spinlocks on PA, I argued
as you did: lock only a barrier for code after and unlock for code
before.  The failing case is that you can have a critical section which
performs an atomically required operation and a following unit which
depends on it being performed.  If you begin the following unit before
the atomic requirement, you may end up losing.  It turns out this kind
of pattern is inherent in a lot of mail box device drivers: you need to
set up the mailbox atomically then poke it.  Setup is usually atomic,
deciding which mailbox to prime and actually poking it is in the
following unit.  Priming often involves an I/O bus transaction and if
you poke before priming, you get a misfire.
Take it up with the man because this was discussed extensively last
year and it was decided that unlocks would not be full barriers.
Thus the changes to memory-barriers.txt that explicitly note this
and the addition of smp_mb__after_unlock_lock() (for two different
locks; an unlock followed by a lock on the same lock is a full barrier).

Code that expects ordered writes after an unlock needs to explicitly
add the memory barrier.
I don't really care what ARM does; spin locks are full barriers on
architectures that need them.  The driver problem we had that detected
our semi permeable spinlocks was an LSI 53c875 which is enterprise class
PCI, so presumably not relevant to ARM anyway.

James
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help