Thread (27 messages) 27 messages, 9 authors, 2008-09-04

Re: Efficient memcpy()/memmove() for G2/G3 cores...

From: David Jander <hidden>
Date: 2008-09-01 07:24:08

On Friday 29 August 2008 14:20:33 Joakim Tjernlund wrote:
[...]
quoted
The problem is: I have very little experience with powerpc assembly and
only very limited time to dedicate to this and I am looking for others
who have
I improved the PowerPC memcpy and friends in uClibc a while ago. It does
basically the same a the kernel memcpy but without any cache
instructions. It is written in C, but in such a way that
optimal assembly is generated.
Hmm, isn't that going to break on a different version of gcc?
I just copied the latest version of trunk/uClibc/libc/string/powerpc/memcpy.c 
from subversion as uclibc-memcpy.c, removed the last line and did this:

$ gcc -shared -O2 -Wall -o libucmemcpy.so uclibc-memcpy.c

(should I use other compiler options?)

Then I started my test program with LD_PRELOAD=...

My test program only copies big chunks of aligned memory, so it will only test 
for maximum throughput (such as copying video frames). I will make a better 
one, to measure throughput on different sized blocks of aligned and unaligned 
memory, but first I want to find out why I can't seem to get even close to 
the expected RAM bandwidth (bursts occur at 1.6 Gbyte/s, sustained transfers 
might be able to reach 400 Mbyte/s in theory, taking into account the video 
controller eating almost half of it, I'd like to get somewhere close to 200).

The result is quite a bit better than that of glibc-2.7 (13.2 Mbyte/s --> 22 
Mbyte/s), but still far from the 71.5 Mbyte/s achieved when using bigger 
strides of 16 registers load/store at a time.
Note, that this is copy performance, one-way througput should be double these 
figures.

I'll try to learn how cache manipulating instructions work, to see if I can 
gain some more bandwith using them.

Regards,

-- 
David Jander
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help