Thread (126 messages) 126 messages, 14 authors, 2018-04-02

Re: RFC on writel and writel_relaxed

From: Gabriel Paubert <hidden>
Date: 2018-03-22 11:25:37
Also in: linux-rdma

On Thu, Mar 22, 2018 at 08:25:43PM +1100, Oliver wrote:
On Thu, Mar 22, 2018 at 7:20 PM, Gabriel Paubert [off-list ref] wrote:
quoted
On Thu, Mar 22, 2018 at 04:24:24PM +1100, Oliver wrote:
quoted
On Thu, Mar 22, 2018 at 1:35 AM, David Laight [off-list ref] wrote:
quoted
quoted
x86 has compiler barrier inside the relaxed() API so that code does not
get reordered. ARM64 architecturally guarantees device writes to be observed
in order.
There are places where you don't even need a compile barrier between
every write.

I had horrid problems getting some ppc code (for a specific embedded SoC)
optimised to have no extra barriers.
I ended up just writing through 'pointer to volatile' and adding an
explicit 'eieio' between the block of writes and status read.
This is what you are supposed to do. For accesses to MMIO (cache
inhibited + guarded) storage the Power ISA guarantees that load-load
and store-store pairs of accesses will always occur in program order,
but there's no implicit ordering between load-store or store-load
And even for load store, eieio is not always necessary, in the important
case of reading and writing to the same address, when modifying bits in
a control register for example.

Typically also loads will be moved ahead of stores, but not the other
way around, so in practice you won't notice a missed eieio in this case.
This does not mean you should not insert it.
Yep, but it doesn't really help us here. The generic accessors need to cope
with the general case.
A generic accessor for modifying fields in a device register might be an 
useful addition to the current set. This is a fairly frequent operation.

Actually I did add macros to do exactly this in drivers for our own 
hardware here almost 20 years ago. I was fed up with writing
writel(readl(reg) & mask | value, reg), especially when reg was not
that simple (one device had over 100 registers). The macros obviously 
guaranteed that both accesses would be to the same register, something 
easy to get wrong with cut and paste.
quoted
quoted
pairs. In those cases you need an explicit eieio barrier between the
two accesses. At the HW level you can think of the CPU as having
separate queues for MMIO loads and stores. Accesses will be added to
the respective queue in program order, but there's no synchronisation
between the two queues. If the CPU is doing write combining it's easy
to imagine the whole store queue being emptied in one big gulp before
the load queue is even touched.
Is write combining allowed on guarded storage?

<Looking at docs>
From PowerISA_V3.0.pdf, Book2, section 1.6.2 "Caching inhibited":

"No combining occurs if the storage is also Guarded"
Yeah it's not allowed. That's what I get for handwaving examples ;)
At least it means that, for cache-inhibited guarded storage, there is a 
one to one correspondance between instructions and bus cycles. The only 
issue left is ordering ;)

	Gabriel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help