Re: [PATCH RFC net-next] vxlan: GRO support at tunnel layer
From: Ramu Ramamurthy <hidden>
Date: 2015-06-28 17:20:43
On 2015-06-26 17:46, Rick Jones wrote:
On 06/26/2015 04:09 PM, Tom Herbert wrote:quoted
Add calls to gro_cells infrastructure to do GRO when receiving on a tunnel. Testing: Ran 200 netperf TCP_STREAM instance - With fix (GRO enabled on VXLAN interface) Verify GRO is happening. 9084 MBps tput 3.44% CPU utilization - Without fix (GRO disabled on VXLAN interface) Verified no GRO is happening. 9084 MBps tput 5.54% CPU utilizationThis has been an area of interest so: Tested-by: Rick Jones <redacted> Some single-stream results between two otherwise identical systems with 82599ES NICs in them, one running a 4.1.0-rc1+ kernel from a davem tree from a while ago, the other running 4.1.0+ from a davem tree pulled yesterday upon which I've applied the patch. Netperf command used: netperf -l 30 -H <IP> -t TCP_MAERTS -c -- -O throughput,local_cpu_util,local_cpu_peak_util,local_cpu_peak_id,local_sd First, inbound to the unpatched system from the patched: MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.21 () port 0 AF_INET : demo Throughput Local Local Local Local CPU Peak Peak Service Util Per CPU Per CPU Demand % Util % ID 5487.42 6.01 99.83 0 2.872 5580.83 6.20 99.16 0 2.911 5445.52 5.68 98.92 0 2.734 5653.36 6.24 99.80 0 2.891 5187.56 5.66 97.41 0 2.858 Second, inbound to the patched system from the unpatched: MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.0.22 () port 0 AF_INET : demo Throughput Local Local Local Local CPU Peak Peak Service Util Per CPU Per CPU Demand % Util % ID 6933.29 3.19 93.67 3 1.208 7031.35 3.34 95.08 3 1.244 7006.28 3.27 94.55 3 1.223 6948.62 3.09 93.20 3 1.165 7007.80 3.22 94.34 3 1.206 Comparing the service demands shows a > 50% reduction in overhead.
Rick, in your test, are you seeing gro becoming effective on the vxlan interface with the 82599ES nic ? (ie, tcpdump on the vxlan interface shows larger frames than the mtu of that interface, and kernel trace shows vxlan_gro_receive() being hit) Throughputs of 5.5 Gbps (or the improved 7Gbs) leads me to suspect that gro is still not effective in your test on the vxlan interface with the 82588ES nic - Because, when vxlan gro became effective with the patch I suggested earlier, I could see throughput of ~8.5 Gbps on that nic.