Thread (103 messages) 103 messages, 20 authors, 2014-09-25

Re: bit fields && data tearing

From: "H. Peter Anvin" <hpa@zytor.com>
Date: 2014-09-09 04:31:12
Also in: linux-arch, lkml

Add a short member for proper alignment and one will probably pop out.

Sent from my tablet, pardon any formatting problems.
On Sep 8, 2014, at 19:56, James Bottomley <James.Bottomley@HansenPartnersh=
ip.com> wrote:
=20
quoted
On Mon, 2014-09-08 at 19:30 -0400, Peter Hurley wrote:
On 09/08/2014 01:50 AM, James Bottomley wrote:
quoted
quoted
Two things: I think that gcc has given up on combining adjacent writes,=
quoted
quoted
quoted
perhaps because unaligned writes on some arches are prohibitive, so
whatever minor optimization was believed to be gained was quickly lost,=
quoted
quoted
quoted
multi-fold. (Although this might not be the reason since one would
expect that gcc would know the alignment of the promoted store).
=20
Um, gcc assumes architecturally correct alignment; that's why it pads
structures.  Only when accessing packed structures will it use the
lowest unit load/store.
=20
if you have a struct { char a, char b }; and load first a then b with a
constant gcc will obligingly optimise to a short store.
=20
Um, please re-look at the test above. The exact test you describe is
coded above and compiled with gcc 4.6.3 cross-compiler for parisc using
the kernel compiler options.
=20
In the generated code, please note the _absence_ of a combined write
to two adjacent byte stores.
=20
So one version of gcc doesn't do it.  Others do because I've been
surprised seeing it in assembly.
=20
quoted
quoted
quoted
But additionally, even if gcc combines adjacent writes _that are part
of the program flow_ then I believe the situation is no worse than
would otherwise exist.
=20
For instance, given the following:
=20
struct x {
   spinlock_t lock;
   long a;
   byte b;
   byte c;
};
=20
void locked_store_b(struct x *p)
{
   spin_lock(&p->lock);
   p->b =3D 1;
   spin_unlock(&p->lock);
   p->c =3D 2;
}
=20
Granted, the author probably expects ordered writes of
   STORE B
   STORE C
but that's not guaranteed because there is no memory barrier
ordering B before C.
=20
Yes, there is: loads and stores may not migrate into or out of critical
sections.
=20
That's a common misconception.
=20
The processor is free to re-order this to:
=20
   STORE C
   STORE B
   UNLOCK
=20
That's because the unlock() only guarantees that:
=20
Stores before the unlock in program order are guaranteed to complete
before the unlock completes. Stores after the unlock _may_ complete
before the unlock completes.
=20
My point was that even if compiler barriers had the same semantics
as memory barriers, the situation would be no worse. That is, code
that is sensitive to memory barriers (like the example I gave above)
would merely have the same fragility with one-way compiler barriers
(with respect to the compiler combining writes).
=20
That's what I meant by "no worse than would otherwise exist".
=20
Actually, that's not correct.  This is actually deja vu with me on the
other side of the argument.  When we first did spinlocks on PA, I argued
as you did: lock only a barrier for code after and unlock for code
before.  The failing case is that you can have a critical section which
performs an atomically required operation and a following unit which
depends on it being performed.  If you begin the following unit before
the atomic requirement, you may end up losing.  It turns out this kind
of pattern is inherent in a lot of mail box device drivers: you need to
set up the mailbox atomically then poke it.  Setup is usually atomic,
deciding which mailbox to prime and actually poking it is in the
following unit.  Priming often involves an I/O bus transaction and if
you poke before priming, you get a misfire.
=20
quoted
quoted
quoted
I see no problem with gcc write-combining in the absence of
memory barriers (as long as alignment is still respected,
ie., the size-promoted write is still naturally aligned). The
combined write is still atomic.
=20
Actual alignment is pretty irrelevant.  That's why all architectures
which require alignment also have to implement misaligned traps ... this=
quoted
quoted
is a fundamental requirement of the networking code, for instance.
=20
The main problem is gcc thinking there's a misalignment (or a packed
structure).  That causes splitting of loads and stores and that destroys=
quoted
quoted
atomicity.
=20
Yeah, the extra requirement I added is basically nonsense, since the
only issue is what instructions the compiler is emitting. So if compiler
thinks the alignment is natural and combines the writes -- ok. If the
compiler thinks the alignment is off and doesn't combine the writes --
also ok.
=20
Yes, I think I can agree that the only real problem is gcc thinking the
store or load needs splitting.
=20
James
=20
=20
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help