Thread (95 messages) 95 messages, 32 authors, 2014-06-25

[PATCH 2/2] mtd: orion-nand: fix build error with ARMv4

From: Jason Gunthorpe <hidden>
Date: 2014-05-13 20:56:02
Also in: lkml

On Fri, May 09, 2014 at 07:09:15PM -0300, Ezequiel Garcia wrote:
On 09 May 03:28 PM, Jason Gunthorpe wrote:
quoted
quoted
I gave this a try in order to answer Arnd's performance
question. First of all, the patch seems wrong. I guess it's because
readsl reads 4-bytes pieces, instead of 8-bytes.

This patch below is tested (but not completely, see below) and works:
Compilers are better now, I think you can just ditch the weirdness:
[..]
quoted
The below gives:

  c8:   ea000002        b       d8 <orion_nand_read_buf+0x84>
  cc:   e5dc0000        ldrb    r0, [ip]
  d0:   e7c30001        strb    r0, [r3, r1]
  d4:   e2811001        add     r1, r1, #1
  d8:   e1510002        cmp     r1, r2

Which looks the same as the asm version to me.
Nice! It wasn't really needed but since I have the board here:

# time nanddump /dev/mtd5 -f /dev/null -q
real	0m 5.82s
user	0m 0.20s
sys	0m 5.60s

Jason: Care to submit a proper patch?
Sure, but did anyone (Arnd?) have thoughts on a better way to do this:

+#ifdef CONFIG_64BIT
+               buf64[i++] = readq_relaxed(io_base);
+#else
+               buf64[i++] = *(const volatile u64 __force *)io_base;
+#endif

IMHO, readq should exist on any platform that can issue a 64 bit bus
transaction, and I expect many ARM's qualify.
On 08 May 04:56 PM, Arnd Bergmann wrote:
Ok, so it takes 5.6 seconds in kernel mode to access 31MB, which
comes down to 5.60MB/s. That isn't very fast compared to the time
the CPU should take for those instructions, so I'm surprised it
actually makes any difference at all.
Likely, what is happening is that the bus interface is holding off
returning the read data until it complets the bus cycles, then the
response travels to the CPU which turns around another.

This creates a dead time where the bus isn't do anything.

The larger bus transfer the CPU can do the less percentage of time the
turnaround takes as overhead.

If the cpu could pipeline two reads then it could be highest-possible,
but I guess the memory ordering for the mapping prevents that??

Regarding DMA, who knows if the interface can handle a burst
transfer..

Jason
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help