Re: [PATCH RFC net-next 00/11] udp gso
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: 2018-08-31 17:17:02
On Fri, Aug 31, 2018 at 5:09 AM Paolo Abeni [off-list ref] wrote:
Hi, On Tue, 2018-04-17 at 17:07 -0400, Willem de Bruijn wrote:quoted
That said, for negotiated flows an inverse GRO feature could conceivably be implemented to reduce rx stack traversal, too. Though due to interleaving of packets on the wire, it aggregation would be best effort, similar to TCP TSO and GRO using the PSH bit as packetization signal.Reviving this old thread, before I forgot again. I have some local patches implementing UDP GRO in a dual way to current GSO_UDP_L4 implementation: several datagram with the same length are aggregated into a single one, and the user space receive a single larger packet instead of multiple ones. I hope quic can leverage such scenario, but I really know nothing about the protocol. I measure roughly a 50% performance improvement with udpgso_bench in respect to UDP GSO, and ~100% using a pktgen sender, and a reduced CPU usage on the receiver[1]. Some additional hacking to the general GRO bits is required to avoid useless socket lookups for ingress UDP packets when UDP_GSO is not enabled. If there is interest on this topic, I can share some RFC patches (hopefully somewhat next week).
As Eric pointed out, QUIC reception on mobile clients over the WAN may not see much gain. But apparently there is a non-trivial amount of traffic the other way, to servers. Again, WAN might limit whatever gain we get, but I do want to look into that. And there are other UDP high throughput workloads (with or without QUIC) between servers. If you have patches, please do share them. I actually also have a rough patch that I did not consider ready to share yet. Based on Tom's existing socket lookup in udp_gro_receive to detect whether a local destination exists and whether it has set an option to support receiving coalesced payloads (along with a cmsg to share the segment size). Converting udp_recvmsg to split apart gso packets to make this transparent seems to me to be too complex and not worth the effort. If a local socket is not found in udp_gro_receive, this could also be tentative interpreted as a non-local path (with false positives), enabling transparent use of GRO + GSO batching on the forwarding path.