Re: [PATCH v2 net-next 0/4] net: batched receive in GRO path
From: Edward Cree <hidden>
Date: 2018-09-11 23:35:20
On 07/09/18 03:32, Eric Dumazet wrote:
Adding this complexity and icache pressure needs more experimental results. What about RPC workloads (eg 100 concurrent netperf -t TCP_RR -- -r 8000,8000 ) Thanks.
Some more results. Note that the TCP_STREAM figures given in the cover letter were '-m 1450'; when I run that with '-m 8000' I hit line rate on my 10G NIC on both the old and new code. Also, these tests are still all with IRQs bound to a single core on the RX side. A further note: the Code Under Test is running on the netserver side (RX side for TCP_STREAM tests); the netperf side is running stock RHEL7u3 (kernel 3.10.0-514.el7.x86_64). This potentially matters more for the TCP_RR test as both sides have to receive data. TCP_STREAM, 8000 bytes, GRO enabled (4 streams) old: 9.415 Gbit/s new: 9.417 Gbit/s (Welch p = 0.087, n₁ = n₂ = 3) There was however a noticeable reduction in *TX* CPU usage, of about 15%. I don't know why that should be (changes in ack timing, perhaps?) TCP_STREAM, 8000 bytes, GRO disabled (4 streams) old: 5.200 Gbit/s new: 5.839 Gbit/s (12.3% faster) (Welch p < 0.001, n₁ = n₂ = 6) TCP_RR, 8000 bytes, GRO enabled (100 streams) (FoM is one-way latency, 0.5 / tps) old: 855.833 us new: 862.033 us (0.7% slower) (Welch p = 0.040, n₁ = n₂ = 6) TCP_RR, 8000 bytes, GRO disabled (100 streams) old: 962.733 us new: 871.417 us (9.5% faster) (Welch p < 0.001, n₁ = n₂ = 6) Conclusion: with GRO on we pay a small but real RR penalty. With GRO off (thus also with traffic that can't be coalesced) we get a noticeable speed boost from being able to use netif_receive_skb_list_internal(). -Ed