Thread (13 messages) 13 messages, 7 authors, 2009-06-19

Re: [PATCH] powerpc: tiny memcpy_(to|from)io optimisation

From: Kenneth Johansson <hidden>
Date: 2009-06-03 14:36:43

On Wed, 2009-06-03 at 08:51 +1000, Benjamin Herrenschmidt wrote:
On Tue, 2009-06-02 at 20:45 +0200, Albrecht Dreß wrote:
quoted
which drops the r1 accesses, but still produces the sub-optimal loop.   
Is this a gcc regression, or did I miss something here?  Probably the  
only bullet-proof way is to write some core loops in assembly... :-/
Well, gcc may be right here. What you call the "optimal" loop uses the
lwzu instruction. An interesting thing about this instruction is that
it updates two GPRs at completion (I'm ignoring the load multiple and
string instructions on purpose here).
I wouldn't be surprised thus if the loop variant with the separate add
ends up more efficient on most implementations around.
On an e300 core using the lwzu/stwu is about 20% faster so at least one
core prefer that optimization. 
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help