Re: [PATCH bpf-next v5 2/5] bpf: implement BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap
From: Peter Oskolkov <hidden>
Date: 2019-02-01 16:48:26
On Thu, Jan 31, 2019 at 5:47 PM Willem de Bruijn [off-list ref] wrote:
On Thu, Jan 31, 2019 at 5:04 PM Daniel Borkmann [off-list ref] wrote:quoted
On 01/31/2019 12:51 AM, Peter Oskolkov wrote:quoted
This patch implements BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap BPF helper. It enables BPF programs (specifically, BPF_PROG_TYPE_LWT_IN and BPF_PROG_TYPE_LWT_XMIT prog types) to add IP encapsulation headers to packets (e.g. IP/GRE, GUE, IPIP). This is useful when thousands of different short-lived flows should be encapped, each with different and dynamically determined destination. Although lwtunnels can be used in some of these scenarios, the ability to dynamically generate encap headers adds more flexibility, e.g. when routing depends on the state of the host (reflected in global bpf maps). Signed-off-by: Peter Oskolkov <redacted>quoted
quoted
+int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool ingress) +{ + struct iphdr *iph; + bool ipv4; + int err; + + if (unlikely(len < sizeof(struct iphdr) || len > LWT_BPF_MAX_HEADROOM)) + return -EINVAL; + + /* validate protocol and length */ + iph = (struct iphdr *)hdr; + if (iph->version == 4) { + ipv4 = true; + if (unlikely(len < iph->ihl * 4)) + return -EINVAL; + } else if (iph->version == 6) { + ipv4 = false; + if (unlikely(len < sizeof(struct ipv6hdr))) + return -EINVAL; + } else { + return -EINVAL; + } + + if (ingress) + err = skb_cow_head(skb, len + skb->mac_len); + else + err = skb_cow_head(skb, + len + LL_RESERVED_SPACE(skb_dst(skb)->dev)); + if (unlikely(err)) + return err; + + /* push the encap headers and fix pointers */ + skb_reset_inner_headers(skb); + skb->encapsulation = 1; + skb_push(skb, len); + if (ingress) + skb_postpush_rcsum(skb, iph, len); + skb_reset_network_header(skb); + memcpy(skb_network_header(skb), hdr, len); + bpf_compute_data_pointers(skb);Does this work transparently with GSO as well or would we need to update shared info for this (like in nat64 case, for example)?Good point. It does need to update the gso_type to include the tunnel type, similar to iptunnel_handle_offloads. Only, the nice feature of this interface is that it is encap protocol independent, which implies that it does not know the correct type. I don't think that we want to allow programs to write the gso_type themselves. With GSO_PARTIAL, perhaps specifying the exact tunnel type can be avoided as long as it is a fixed prefix to replicate? The transport layer size does not change, so no need to recompute gso_segs? Either way, this seems non-trivial enough to me to do in a separate follow-on patch. For now just fail if skb_is_gso.
Thanks, Willem, for the details! I'll re-send the patchset with the additional check that skb_is_gso() is false.