Re: [PATCH v2 net-next 0/4] net: batched receive in GRO path

From: Edward Cree <hidden>
Date: 2018-09-11 23:35:20

On 07/09/18 03:32, Eric Dumazet wrote:

Adding this complexity and icache pressure needs more experimental results.
What about RPC workloads  (eg 100 concurrent netperf -t TCP_RR -- -r 8000,8000 )

Thanks.

Some more results.  Note that the TCP_STREAM figures given in the cover
 letter were '-m 1450'; when I run that with '-m 8000' I hit line rate on
 my 10G NIC on both the old and new code.  Also, these tests are still all
 with IRQs bound to a single core on the RX side.
A further note: the Code Under Test is running on the netserver side (RX
 side for TCP_STREAM tests); the netperf side is running stock RHEL7u3
 (kernel 3.10.0-514.el7.x86_64).  This potentially matters more for the
 TCP_RR test as both sides have to receive data.

TCP_STREAM, 8000 bytes, GRO enabled (4 streams)
old: 9.415 Gbit/s
new: 9.417 Gbit/s
(Welch p = 0.087, n₁ = n₂ = 3)
There was however a noticeable reduction in *TX* CPU usage, of about 15%.
 I don't know why that should be (changes in ack timing, perhaps?)

TCP_STREAM, 8000 bytes, GRO disabled (4 streams)
old: 5.200 Gbit/s
new: 5.839 Gbit/s (12.3% faster)
(Welch p < 0.001, n₁ = n₂ = 6)

TCP_RR, 8000 bytes, GRO enabled (100 streams)
(FoM is one-way latency, 0.5 / tps)
old: 855.833 us
new: 862.033 us (0.7% slower)
(Welch p = 0.040, n₁ = n₂ = 6)

TCP_RR, 8000 bytes, GRO disabled (100 streams)
old: 962.733 us
new: 871.417 us (9.5% faster)
(Welch p < 0.001, n₁ = n₂ = 6)

Conclusion: with GRO on we pay a small but real RR penalty.  With GRO off
 (thus also with traffic that can't be coalesced) we get a noticeable
 speed boost from being able to use netif_receive_skb_list_internal().

-Ed

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help