[PATCH] usb: ehci: fix update qtd->token in qh_append_tds

From: Ming Lei <hidden>
Date: 2011-08-29 15:55:29
Also in: linux-omap

Hi,

On Mon, Aug 29, 2011 at 9:57 PM, Alan Stern [off-list ref] wrote:

On Mon, 29 Aug 2011, Russell King - ARM Linux wrote:

quoted

You know better than I do what is needed to resolve the ordering issue.
However, contrary to what the original patch description said, this
isn't entirely a matter of making the write visible to the host
controller: No doubt in time the write will eventually become visible
anyway. ?It's a matter of making the write become visible reasonably
quickly and in the correct order with respect to other writes.

I'm not entirely sure what the problem is - I think its about a write
by the CPU to dma coherent memory being delayed and not being visible
to the HC in a timely manner. ?Either mb() or wmb() placed after the
write on ARM will do that - and ARM has no requirement to do a read-
back after the barrier.

Okay, then this needs to be done in a way that won't slow down other
architectures with an unnecessary memory barrier. ?And there needs to
be a comment in the code explaining that the new mb() instruction isn't
being used as a memory barrier but rather to expedite writeback of the
L2 cache.

If writing to coherent memory can't reach physical memory immediately on
other ARCHs,  the problem can still happen on these ARCHs. But I am
not sure if there are these kind of ARCHs except for ARM.

Anyway, current memory barriers in qh_append_tds() can't prevent the problem
from happening on ARM.

If no better solutions, maybe we have to use 'mb()' after
'dummy->hw_token = token'
to fix the problem:

This certainly is starting to sound like something that needs to be
addressed in the arch-specific #include files...

quoted

Is this extra L2-cache "poke" needed for proper ordering, or is it
needed merely to flush the write out to memory in a timely manner?

Both, though primerily it's about ensuring correct ordering. ?A side
effect of it is that it will flush all pending writes in L2 before
completing.

From the theoretical viewpoint, I think I'm right to say that mb()
doesn't need to provide that level of ordering as its supposed to be
an inter-CPU barrier - which probably means we need to invent a new
barrier to deal with DMA memory ordering. ?However, given the
difficulty of getting the existing barriers placed correctly, I don't
think inventing new barriers is a very good idea.

What we can do is view devices which perform DMA as being strongly
ordered with respect to their memory accesses - iow, they have an
implicit memory barrier before and after their accesses to memory.
This would make the CPUs use of mb() have a conceptual pairing with
the DMA agents.

Yes, that's the model I have been using all along. ?After all, if a DMA
master carries out its memory accesses in some random order then it's
impossible for the CPU to make any guarantees.

Alan Stern

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help