Thread (5 messages) 5 messages, 4 authors, 1999-08-18

Re: [linux-fbdev] Re: readl() and friends and eieio on PPC

From: Gabriel Paubert <hidden>
Date: 1999-08-18 11:02:43

Possibly related (same subject, not in this thread)



On Thu, 12 Aug 1999, Geert Uytterhoeven wrote:
On Thu, 12 Aug 1999, Paul Mackerras wrote:
quoted
Richard Henderson [off-list ref] wrote:
The results tended to vary quite a lot from run to run, but here's a
typical set:

17 10 9 9 9
24 17 16 16 16
732 731 736 786 727
666 755 840 774 801

So the eieio doesn't look to be nearly as expensive on PPC as wmb is
on alpha.  (16 - 9) / 7 = 1 cycle for the eieio, which is going to be
I'm seeing different things (results don't tend to vary a lot):

| [14:27:01]/tmp# ./a.out 0xc2800000
| 35 29 30 31 28 
| 261 251 247 248 248 
| 429 332 358 374 348 
| 541 532 529 531 529 
| [14:27:05]/tmp# 

Hence eieio() is quite expensive on memory.

This in on an IBM LongTrail (CHRP), with 604e at 200 MHz, 512 KB L2 cache,
66 MHz SDRAM bus, and 33 MHz PCI to an ATI RAGE II+.
Not surprising, on 603 and G3, eieio is an internal operation (it
prevents some forms of write combining on the G3). On 604 (and
601 AFAIR) every eieio translates into an actual bus cycle, which takes
time. Don't ask me exactly why (probably SMP issues).

However, expect the cost of always inserting an eieio to become huge
on a G4  if it ever comes out: it has longer memory queues and should
perform more aggressive combinations of memory operations from adjacent
addresses. 

Also a smart host bridge can merge writes from a processor into a burst
PCI transaction, the eieio cycle tells where it has to break the burst. 
quoted
insignificant in the context of an access to a device register, which
can easily take ~ 50 to 100 cycles.
For ISA (through PCI/ISA bridge). Isn't real PCI faster?
Depends on what you processor clock and whether you are speaking of reads
or writes. With posted writes which effectively stop at the host bridge,
this figure sounds exaggerated indeed (core / bus ratio between 3 and 6,
around 4 processor bus clocks for a single beat cycle).

OTOH, when filling a framebuffer, the buffers in the host bridge are
rapidly filled, write posting does not help and the figure might be
reasonable.

	Greetings,
	Gabriel.


[[ This message was sent via the linuxppc-dev mailing list.  Replies are ]]
[[ not  forced  back  to the list, so be sure to Cc linuxppc-dev if your ]]
[[ reply is of general interest. Please check http://lists.linuxppc.org/ ]]
[[ and http://www.linuxppc.org/ for useful information before posting.   ]]
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help