Thread (6 messages) 6 messages, 2 authors, 2006-11-27

Re: PPC440GX ethernet oddities

From: Eugene Surovegin <hidden>
Date: 2006-11-27 04:25:57

On Sun, Nov 26, 2006 at 08:05:49PM -0800, Jeff Mock wrote:
I'm having a slightly strange behavior with PPC440GX ethernet, I'm
looking for a little advice where I can poke around to see what's going on.

I have a custom 440GX board, I use the two RGMII gigabit interfaces to
two Vistesse PHYs.  This works nicely.
What is the CPU clock speed?
The board has a large FPGA signal processor that is DMA'ing data into
main memory, the PPC sends data from main memory out the ethernet
interfaces.  This all works well.  For testing purposes I'm DMA'ing a
pseudo random sequence at 80MB/s, sending this over ethernet on a TCP
socket to a server machine and checking the sequence at the receiving
end. 
Could you elaborate a little here. Is it a user-space program or some 
kernel mode code which does the copying?

[snip]
Only one of my four boxes shows this strange behavior, and only when the
second ethernet port is connected to an ethernet switch. Everything
still works properly, my 80MB/s pseudo random sequence is still
generated by the FPGAs and checked by a server on the other end of the
network connection.  I let the ring buffer get as large as 64MB before
failing, but the large ring buffer says that the network connection
sometimes gets as much as 25MB behind the FPGA DMA, or 25/80 = 0.3125
seconds, which seems kind of crazy.
Well, 300ms doesn't look particularly crazy to me given a data stream 
and the fact you are using non-realtime OSes on both ends.
I look at "ifconfig" (busybox ifconfig) and I see no errors on the
ethernet interface.
Try ethtool -S, EMAC driver supports it.
I'm guessing there might be some design problem or
maybe just a problem with this one particular board that is causing
errors that occasionally slows down the TCP connection, perhaps
crosstalk between the two RGMII interfaces or maybe some interaction
between the magnetics on the two ports, but I can't figure out where to
look to measure errors on the physical ethernet interface.

Can someone give me a hint about where to look for this problem? This is
a 2.6.15 kernel.
There are some stuff you can do:

1) Try another GigE switch, better if from the different vendor

2) Try another peer system you are streaming your data to (better with 
different GigE NIC)

3) Check if hw flow-control is enabled and try disabling it on both 
ends

4) Play with socket buffer sizes on both ends

5) And the final, but the most useful one - get a realtime capture of 
the TCP stream when your ring buffer starts growing. The best way 
is to get GigE switch which supports port mirroring or use special 
hw network analyzer.

-- 
Eugene
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help