Re: [PATCH v1 1/6] net: Generalize udp based tunnel offload
From: John Fastabend <john.fastabend@gmail.com>
Date: 2015-12-08 07:34:08
On 15-12-02 04:15 PM, Tom Herbert wrote:
On Wed, Dec 2, 2015 at 3:35 PM, John Fastabend [off-list ref] wrote:quoted
[...]quoted
quoted
I wonder why we need protocol generic offloads? I know there are currently a lot of overlay encapsulation protocols. Are there many more coming?Yes, and assume that there are more coming with an unbounded limit (for instance I just noticed today that there is a netdev1.1 talk on supporting GTP in the kernel). Besides, this problem space not just limited to offload of encapsulation protocols, but how to generalize offload of any transport, IPv[46], application protocols, protocol implemented in user space, security protocols, etc.quoted
Besides, this offload is about TSO and RSS and they do need to parse the packet to get the information where the inner header starts. It is not only about checksum offloading.RSS does not require the device to parse the inner header. All the UDP encapsulations protocols being defined set the source port to entropy flow value and most devices already support RSS+UDP (just needs to be enabled) so this works just fine with dumb NICs. In fact, this is one of the main motivations of encapsulating UDP in the first place, to leverage existing RSS and ECMP mechanisms. The more general solution is to use IPv6 flow label (RFC6438). We need HW support to include the flow label into the hash for ECMP and RSS, but once we have that much of the motivation for using UDP goes away and we can get back to just doing GRE/IP, IPIP, MPLS/IP, etc. (hence eliminate overhead and complexity of UDP encap).quoted
Please provide a sketch up for a protocol generic api that can tell hardware where a inner protocol header starts that supports vxlan, vxlan-gpe, geneve and ipv6 extension headers and knows which protocol is starting at that point.BPF. Implementing protocol generic offloads are not just a HW concern either, adding kernel GRO code for every possible protocol that comes along doesn't scale well. This becomes especially obvious when we consider how to provide offloads for applications protocols. If the kernel provides a programmable framework for the offloads then application protocols, such as QUIC, could use use that without needing to hack the kernel to support the specific protocol (which no one wants!). Application protocol parsing in KCM and some other use cases of BPF have already foreshadowed this, and we are working on a prototype for a BPF programmable engine in the kernel. Presumably, this same model could eventually be applied as the HW API to programmable offload.Just keying off the last statement there... I think BPF programs are going to be hard to translate into hardware for most devices. The problem is the BPF programs in general lack structure. A parse graph would be much more friendly for hardware or at minimum the BPF program would need to be a some sort of well-structured program so a driver could turn that into a parse graph.This might be relevant: http://richard.systems/research/pdf/IEEE_HPSR_BPF_OPENFLOW.pdf
Thanks Tom interesting read but they seem to argue for a BPF engine in hardware which I'm still not convinced is necessary and the numbers provided are for a 1Gbps link where 10Gpbs/100Gbps+ would be more valuable. I am still leaning towards a fully programmable parse graph and a set of basic actions push/pop/set/fwd/etc. This would be useful for other features not just checksum offloads. I guess it doesn't necessarily exclude also having 1s complement logic though. .John