RE: e1000 full-duplex TCP performance well below wire speed
From: Bruce Allen <hidden>
Date: 2008-01-31 08:31:39
Hi Jesse,
quoted
It's good to be talking directly to one of the e1000 developers and maintainers. Although at this point I am starting to think that the issue may be TCP stack related and nothing to do with the NIC. Am I correct that these are quite distinct parts of the kernel?Yes, quite.
OK. I hope that there is also someone knowledgable about the TCP stack who is following this thread. (Perhaps you also know this part of the kernel, but I am assuming that your expertise is on the e1000/NIC bits.)
quoted
Important note: we ARE able to get full duplex wire speed (over 900 Mb/s simulaneously in both directions) using UDP. The problems occur only with TCP connections.That eliminates bus bandwidth issues, probably, but small packets take up a lot of extra descriptors, bus bandwidth, CPU, and cache resources.
I see. Your concern is the extra ACK packets associated with TCP. Even those these represent a small volume of data (around 5% with MTU=1500, and less at larger MTU) they double the number of packets that must be handled by the system compared to UDP transmission at the same data rate. Is that correct?
quoted
I have to wait until Carsten or Henning wake up tomorrow (now 23:38 in Germany). So we'll provide this info in ~10 hours.I would suggest you try TCP_RR with a command line something like this: netperf -t TCP_RR -H <hostname> -C -c -- -b 4 -r 64K I think you'll have to compile netperf with burst mode support enabled.
I just saw Carsten a few minutes ago. He has to take part in a 'Baubesprechung' meeting this morning, after which he will start answering the technical questions and doing additional testing as suggested by you and others. If you are on the US west coast, he should have some answers and results posted by Thursday morning Pacific time.
quoted
I assume that the interrupt load is distributed among all four cores -- the default affinity is 0xff, and I also assume that there is some type of interrupt aggregation taking place in the driver. If the CPUs were not able to service the interrupts fast enough, I assume that we would also see loss of performance with UDP testing.quoted
One other thing you can try with e1000 is disabling the dynamic interrupt moderation by loading the driver with InterruptThrottleRate=8000,8000,... (the number of commas depends on your number of ports) which might help in your particular benchmark.OK. Is 'dynamic interrupt moderation' another name for 'interrupt aggregation'? Meaning that if more than one interrupt is generated in a given time interval, then they are replaced by a single interrupt?Yes, InterruptThrottleRate=8000 means there will be no more than 8000 ints/second from that adapter, and if interrupts are generated faster than that they are "aggregated." Interestingly since you are interested in ultra low latency, and may be willing to give up some cpu for it during bulk transfers you should try InterruptThrottleRate=1 (can generate up to 70000 ints/s)
I'm not sure it's quite right to say that we are interested in ultra low latency. Most of our network transfers involve bulk data movement (a few MB or more). We don't care so much about low latency (meaning how long it takes the FIRST byte of data to travel from sender to receiver). We care about aggregate bandwidth: once the pipe is full, how fast can data be moved through it. Sow we don't care so much if getting the pipe full takes 20 us or 50 us. We just want the data to flow fast once the pipe IS full.
Welcome, its an interesting discussion. Hope we can come to a good conclusion.
Thank you. Carsten will post more info and answers later today. Cheers, Bruce