Thread (42 messages) 42 messages, 11 authors, 2018-04-02

Re: RFC on writel and writel_relaxed

From: Will Deacon <hidden>
Date: 2018-03-28 09:07:01
Also in: linux-rdma, linuxppc-dev

Possibly related (same subject, not in this thread)

On Wed, Mar 28, 2018 at 05:42:56PM +1100, Benjamin Herrenschmidt wrote:
On Tue, 2018-03-27 at 20:26 -1000, Linus Torvalds wrote:
quoted
On Tue, Mar 27, 2018 at 6:33 PM, Benjamin Herrenschmidt
[off-list ref] wrote:
quoted
This is why, I want (with your agreement) to define clearly and once
and for all, that the Linux semantics of writel are that it is ordered
with previous writes to coherent memory (*)
Honestly, I think those are the sane semantics. In fact, make it
"ordered with previous writes" full stop, since it's not only ordered
wrt previous writes to memory, but also previous writel's.
Of course. It was somewhat a given that it's ordered vs. any previous
MMIO actually, but it doesn't hurt to spell it out once more.
Good. So I think this confirms our understanding so far.
quoted
quoted
Also, can I assume the above ordering with writel() equally applies to
readl() or not ?

IE:
        dma_buf->foo = 1;
        readl(STUPID_DEVICE_DMA_KICK_ON_READ);
If that KICK_ON_READ is UC, then that's definitely the case. And
honestly, status registers like that really should always be UC.

But if somebody sets the area WC (which is crazy), then I think it
might be at least debatable. x86 semantics does allow reads to be done
before previous writes (or, put another way, writes to be buffered -
the buffers are ordered so writes don't get re-ordered, but reads can
happen during the buffering).
Right, for now I worry about UC semantics. Once we have nailed that, we
can look at WC, which is a lot more tricky as archs differs more
widely, but one thing at a time.
quoted
But UC accesses are always  done entirely ordered, and honestly, any
status register that starts a DMA would not make sense any other way.

Of course, you'd have to be pretty odd to want to start a DMA with a
read anyway - partly exactly because it's bad for performance since
reads will be synchronous and not buffered like a write).
I have bad memories of old adaptec controllers ...

That said, I think the above might not be right on ARM if we want to
make it the rule, Will, what do you reckon ?
So there are two cases to consider:

1.
	if (readl(DEVICE_DMA_STATUS) == DMA_DONE)
		mydata = *dma_bufp;



2.
	*dma_bufp = 42;
	readl(DEVICE_DMA_KICK_ON_READ);


For arm/arm64 we guarantee ordering for (1) but not for (2) -- you'd need to
add an mb() to make it work.

Do both of these work on power? If so, I guess I can make readl even more
expensive :/ Feels a bit like the tail wagging the dog, though.

Another thing I just realised is that we restrict the barriers we use in
readl/writel on arm64 so that they don't necessary apply to both loads and
stores. To be specific:

   writel is ordered against prior writes to memory, but not reads

   readl is ordered against subsequent reads of memory, but not writes (but
   note that in example (1) above, the control dependency ensures that).

If necessary, I could move the barrier in our readl implementation to be
before the read, then play the control-dependency + instruction-sync (ISB)
trick that you do on power.

Will
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help