Thread (52 messages) 52 messages, 11 authors, 2018-09-03

Re: [PATCH RFC net-next 00/11] udp gso

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: 2018-08-31 17:17:02

On Fri, Aug 31, 2018 at 5:09 AM Paolo Abeni [off-list ref] wrote:
Hi,

On Tue, 2018-04-17 at 17:07 -0400, Willem de Bruijn wrote:
quoted
That said, for negotiated flows an inverse GRO feature could
conceivably be implemented to reduce rx stack traversal, too.
Though due to interleaving of packets on the wire, it aggregation
would be best effort, similar to TCP TSO and GRO using the
PSH bit as packetization signal.
Reviving this old thread, before I forgot again. I have some local
patches implementing UDP GRO in a dual way to current GSO_UDP_L4
implementation: several datagram with the same length are aggregated
into a single one, and the user space receive a single larger packet
instead of multiple ones. I hope quic can leverage such scenario, but I
really know nothing about the protocol.

I measure roughly a 50% performance improvement with udpgso_bench in
respect to UDP GSO, and ~100% using a pktgen sender, and a reduced CPU
usage on the receiver[1]. Some additional hacking to the general GRO
bits is required to avoid useless socket lookups for ingress UDP
packets when UDP_GSO is not enabled.

If there is interest on this topic, I can share some RFC patches
(hopefully somewhat next week).
As Eric pointed out, QUIC reception on mobile clients over the WAN
may not see much gain. But apparently there is a non-trivial amount
of traffic the other way, to servers. Again, WAN might limit whatever
gain we get, but I do want to look into that. And there are other UDP high
throughput workloads (with or without QUIC) between servers.

If you have patches, please do share them. I actually also have a rough
patch that I did not consider ready to share yet. Based on Tom's existing
socket lookup in udp_gro_receive to detect whether a local destination
exists and whether it has set an option to support receiving coalesced
payloads (along with a cmsg to share the segment size).

Converting udp_recvmsg to split apart gso packets to make this
transparent seems to me to be too complex and not worth the effort.

If a local socket is not found in udp_gro_receive, this could also be
tentative interpreted as a non-local path (with false positives), enabling
transparent use of GRO + GSO batching on the forwarding path.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help