Thread (32 messages) 32 messages, 8 authors, 2008-02-01

Re: e1000 full-duplex TCP performance well below wire speed

From: Bruce Allen <hidden>
Date: 2008-01-31 19:14:11

Hi Auke,
quoted
quoted
quoted
Important note: we ARE able to get full duplex wire speed (over 900
Mb/s simulaneously in both directions) using UDP.  The problems occur
only with TCP connections.
That eliminates bus bandwidth issues, probably, but small packets take
up a lot of extra descriptors, bus bandwidth, CPU, and cache resources.
I see.  Your concern is the extra ACK packets associated with TCP.  Even
those these represent a small volume of data (around 5% with MTU=1500,
and less at larger MTU) they double the number of packets that must be
handled by the system compared to UDP transmission at the same data
rate. Is that correct?
A lot of people tend to forget that the pci-express bus has enough 
bandwidth on first glance - 2.5gbit/sec for 1gbit of traffix, but apart 
from data going over it there is significant overhead going on: each 
packet requires transmit, cleanup and buffer transactions, and there are 
many irq register clears per second (slow ioread/writes). The 
transactions double for TCP ack processing, and this all accumulates and 
starts to introduce latency, higher cpu utilization etc...
Based on the discussion in this thread, I am inclined to believe that lack 
of PCI-e bus bandwidth is NOT the issue.  The theory is that the extra 
packet handling associated with TCP acknowledgements are pushing the PCI-e 
x1 bus past its limits.  However the evidence seems to show otherwise:

(1) Bill Fink has reported the same problem on a NIC with a 133 MHz 64-bit 
PCI connection.  That connection can transfer data at 8Gb/s.

(2) If the theory is right, then doubling the MTU from 1500 to 3000 should 
have significantly reduce the problem, since it drops the number of ACK's 
by two.  Similarly, going from MTU 1500 to MTU 9000 should reduce the 
number of ACK's by a factor of six, practically eliminating the problem. 
But changing the MTU size does not help.

(3) The interrupt counts are quite reasonable.  Broadcom NICs without 
interrupt aggregation generate an order of magnitude more irq/s and this 
doesn't prevent wire speed performance there.

(4) The CPUs on the system are largely idle.  There are plenty of 
computing resources available.

(5) I don't think that the overhead will increase the bandwidth needed by 
more than a factor of two.  Of course you and the other e1000 developers 
are the experts, but the dominant bus cost should be copying data buffers 
across the bus. Everything else in minimal in comparison.

Intel insiders: isn't there some simple instrumentation available (which 
read registers or statistics counters on the PCI-e interface chip) to tell 
us statistics such as how many bits have moved over the link in each 
direction? This plus some accurate timing would make it easy to see if the 
TCP case is saturating the PCI-e bus.  Then the theory addressed with data 
rather than with opinions.

Cheers,
 	Bruce
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help