Re: [PATCH v6 2/2] clocksource: add J-Core timer/clocksource driver

From: Rich Felker <hidden>
Date: 2016-08-24 21:45:41
Also in: linux-sh, lkml

On Wed, Aug 24, 2016 at 10:22:13PM +0100, Mark Rutland wrote:

On Wed, Aug 24, 2016 at 04:52:26PM -0400, Rich Felker wrote:

quoted

On Wed, Aug 24, 2016 at 10:01:08PM +0200, Arnd Bergmann wrote:

quoted

On Wednesday, August 24, 2016 1:40:01 PM CEST Rich Felker wrote:

quoted

I actually tried multiple times to find good resources on policy for
which form to prefer, but didn't have much luck. My understanding is
that __raw_writel/__raw_readl always performs a native-endian
load/store, whereas writel/readl behavior depends on cpu endianness
and whether the arch declares that "pci bus" (that was the term I
found in the source, not relevant here) io is endian-swapped or not.
Can you give me a better explanation of why we might prefer one form
or the other?

In general, a portable driver should use readl/writel because they
are defined to have the behavior that x86 provides, which is what
most developers understand best. Compared to __raw_readl/__raw_writel,
it guarantees

- correct endianess (in most cases, exceptions see below)
- does not leak out of spinlocks

Not even normal non-volatile accesses can leak across spinlocks;
they're memory+compiler barriers.

Note that Arnd was talking in general, about cross-architectural guarantees.

Across architectures, not all barriers are equivalent. On some architectures,
the barriers used by spinlocks don't necessarily order accesses across
disparate memory types (i.e. DRAM and IO), or might not guarantee that the
ordering is consistent to observers other than (coherent) CPUs. Hence a
portable driver should care about this distinction.

If they're equivalent for you, you may as well use the portably stronger form,
as it's less likely to surprise others.

*nod* Understood.

quoted

- accesses are word-sized, the compiler cannot split them into
  byte accesses, or merge subsequent accesses into larger words

Based on my reading of the code that's also true for the raw ones. At
least GCC interprets volatile as disallowing split/combined
loads/stores when the hardware supports the nominal load/store size,
and this seems to agree with the intent of the C standard that
volatile be usable for memory-mapped devices..

quoted

- ordering against dma_map_*/dma_unmap_*/dma_sync_* is maintained

Aren't they barriers too?

As above, not all barriers are equivalent. Your architecture may provide
stronger (or weaker) guarantees than others.

OK.

quoted

The __raw_* accessors are used as building blocks for
readl/outl/ioread32/ioread32_be/readl_relaxed/... and they can
usually be used safely on device RAM such as framebuffers but
not much else in portable code. Some architectures also use the
__raw_* accessors for MMIO registers, but that code is generally
not portable.

My concern is that, depending on some arcane defines, readl/writel
could start doing byte-reversal of values since the CPU is big-endian.
arch/sh seems to have them defined in such a way that this doesn't
happen, but I was hoping to avoid depending on that. If you think it's
safe to assume they do the right thing, though, I'm okay with it. If
anything breaks when we try a little-endian version of the CPU/SoC
later I can go back and fix it.

I guess this really depends on the expectations you have w.r.t. the endianness
of devices vs the endianness of CPUs. If you expect the two to always be
matched, please add a comment in the driver to that effect.

As Arnd and others mentioned, the vastly common case is that the endianness of
CPUs and devices is not fixed to each other, and devices are typically
little-endian.

Since it's not used in other designs, I can't say anything meaningful
about how it hypothetically might be, but the intent on all current
and currently-planned systems is that 32-bit loads/stores be performed
using values that are meaningful as values on the cpu, i.e. with no
byte swapping regardless of cpu endianness. The current hardware is
designed as big-endian, and the proposed patches for supporting little
endian just involve flipping the low bits of the address for 16- and
8-bit accesses (not relevant to mmio registers that are only
accessible in 32-bit units).

quoted

Endianess is always more complicated than one thinks, and it's
handled differently across architectures, sometimes because of
hardware differences, sometimes for historic reasons that differ
only in the Linux implementation.

A large majority of devices are fixed to little-endian MMIO
registers (as recommended by PCI), so that is a good default.

Exceptions to this are:

* Some architectures (e.g. PowerPC) typically run big-endian code,
  and hardware designers decided to make the MMIO registers the
  same. In this case, you can generally use ioread32_be/iowrite32_be.

This is something like our case, but the mmio registers are accessed
as 32-bit values and the endianness does not come into play.

This really depends on you endianness model, bus, etc. The above sounds like
BE32 rather than BE8, assuming I've understood correctly, which is an unusual
case nowadays.

I'm not familiar with those classifications, but from what I can tell,
BE32 describes it correctly. I'll see if I can get someone to verify
this. Is there a reason it's not widely used anymore? Perhaps
something related to supporting misaligned word-sized loads/stores?

Rich
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help