Thread (52 messages) 52 messages, 11 authors, 2018-09-03

Re: [PATCH RFC net-next 00/11] udp gso

From: Paolo Abeni <pabeni@redhat.com>
Date: 2018-08-31 17:51:48

On Fri, 2018-08-31 at 09:08 -0400, Willem de Bruijn wrote:
On Fri, Aug 31, 2018 at 5:09 AM Paolo Abeni [off-list ref] wrote:
quoted
Hi,

On Tue, 2018-04-17 at 17:07 -0400, Willem de Bruijn wrote:
quoted
That said, for negotiated flows an inverse GRO feature could
conceivably be implemented to reduce rx stack traversal, too.
Though due to interleaving of packets on the wire, it aggregation
would be best effort, similar to TCP TSO and GRO using the
PSH bit as packetization signal.
Reviving this old thread, before I forgot again. I have some local
patches implementing UDP GRO in a dual way to current GSO_UDP_L4
implementation: several datagram with the same length are aggregated
into a single one, and the user space receive a single larger packet
instead of multiple ones. I hope quic can leverage such scenario, but I
really know nothing about the protocol.

I measure roughly a 50% performance improvement with udpgso_bench in
respect to UDP GSO, and ~100% using a pktgen sender, and a reduced CPU
usage on the receiver[1]. Some additional hacking to the general GRO
bits is required to avoid useless socket lookups for ingress UDP
packets when UDP_GSO is not enabled.

If there is interest on this topic, I can share some RFC patches
(hopefully somewhat next week).
As Eric pointed out, QUIC reception on mobile clients over the WAN
may not see much gain. But apparently there is a non-trivial amount
of traffic the other way, to servers. Again, WAN might limit whatever
gain we get, but I do want to look into that. And there are other UDP high
throughput workloads (with or without QUIC) between servers.

If you have patches, please do share them. 
I'll try to clean them up and send them next week (as RFC).
I actually also have a rough
patch that I did not consider ready to share yet. Based on Tom's existing
socket lookup in udp_gro_receive to detect whether a local destination
exists and whether it has set an option to support receiving coalesced
payloads (along with a cmsg to share the segment size).
That is more or less what I'm doing here.
Side note: I had test it in baremetal, as veth/lo do not trigger the
GRO path: selftest of this feature is not so straightforward.
Converting udp_recvmsg to split apart gso packets to make this
transparent seems to me to be too complex and not worth the effort.
Agreed. Moreover doing many, small, recvmsg() instead of a single,
large, one will hit the performances very badly due to PTI and
HARDENED_USERCOPY.
If a local socket is not found in udp_gro_receive, this could also be
tentative interpreted as a non-local path (with false positives), enabling
transparent use of GRO + GSO batching on the forwarding path.
That sounds interesting, even if false positive looks dangerous to me.
Just to be on the same page, which false positive examples are you
thinking at? UDP sockets bound to local address behind NAT?

Cheers,

Paolo
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help