Thread (13 messages) 13 messages, 7 authors, 2009-06-19

Re: [PATCH] powerpc: tiny memcpy_(to|from)io optimisation

From: Albrecht Dreß <hidden>
Date: 2009-06-02 18:46:04

Am 01.06.09 08:14 schrieb(en) Joakim Tjernlund:
.. not even 4.2.2 which is fairly modern will get it right. It breaks  
very easy as gcc has never been any good at this type of  
optimization. Sometimes small changes will make gcc unhappy and it  
won't do the right optimization.
It's even worse...  Looking at the assembly output of the simple  
function

<snip>
void loop2(void * src, void * dst, int n)
{
   volatile uint32_t * _dst = (volatile uint32_t *) (dst - 4);
   volatile uint32_t * _src = (volatile uint32_t *) (src - 4);
   n >>= 2;
   do {
     *(++_dst) = *(++_src);
   } while (--n);
}
</snip>

gcc 4.0.1 coming with Apple's Developer Tools (on Tiger) with options  
"-O3 -mcpu=603e -mtune=603e" produces

<snip>
_loop2:
         srawi r5,r5,2
         mtctr r5
         addi r4,r4,-4
         addi r3,r3,-4
L11:
         lwzu r0,4(r3)
         stwu r0,4(r4)
         bdnz L11
         blr
</snip>

which looks perfect to me.  However, gcc 4.3.3 on Ubuntu/PPC produces  
with the same options

<snip>
loop2:
         srawi 5,5,2
         stwu 1,-16(1)
         mtctr 5
         li 9,0
.L8:
         lwzx 0,3,9
         stwx 0,4,9
         addi 9,9,4
         bdnz .L8
         addi 1,1,16
         blr
</snip>

wasting a register and a statement in the loop core, and fiddles around  
with the stack pointer for no good reason.  Gcc 4.4.0 produces

<snip>
loop2:
         srawi 5,5,2
         mtctr 5
         li 9,0
.L9:
         lwzx 0,3,9
         stwx 0,4,9
         addi 9,9,4
         bdnz .L9
         blr
</snip>

which drops the r1 accesses, but still produces the sub-optimal loop.   
Is this a gcc regression, or did I miss something here?  Probably the  
only bullet-proof way is to write some core loops in assembly... :-/

Thanks, Albrecht.

Attachments

  • (unnamed) [application/pgp-signature] 189 bytes
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help