Re: [PATCH bpf-next v5 2/5] bpf: implement BPF_LWT_ENCAP_IP mode in... | netdev

Re: [PATCH bpf-next v5 2/5] bpf: implement BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap

From: Peter Oskolkov <hidden>
Date: 2019-02-01 16:48:26

On Thu, Jan 31, 2019 at 5:47 PM Willem de Bruijn
[off-list ref] wrote:

On Thu, Jan 31, 2019 at 5:04 PM Daniel Borkmann [off-list ref] wrote:

quoted

On 01/31/2019 12:51 AM, Peter Oskolkov wrote:

quoted

This patch implements BPF_LWT_ENCAP_IP mode in bpf_lwt_push_encap
BPF helper. It enables BPF programs (specifically, BPF_PROG_TYPE_LWT_IN
and BPF_PROG_TYPE_LWT_XMIT prog types) to add IP encapsulation headers
to packets (e.g. IP/GRE, GUE, IPIP).

This is useful when thousands of different short-lived flows should be
encapped, each with different and dynamically determined destination.
Although lwtunnels can be used in some of these scenarios, the ability
to dynamically generate encap headers adds more flexibility, e.g.
when routing depends on the state of the host (reflected in global bpf
maps).

Signed-off-by: Peter Oskolkov <redacted>

quoted

+int bpf_lwt_push_ip_encap(struct sk_buff *skb, void *hdr, u32 len, bool ingress)
+{
+     struct iphdr *iph;
+     bool ipv4;
+     int err;
+
+     if (unlikely(len < sizeof(struct iphdr) || len > LWT_BPF_MAX_HEADROOM))
+             return -EINVAL;
+
+     /* validate protocol and length */
+     iph = (struct iphdr *)hdr;
+     if (iph->version == 4) {
+             ipv4 = true;
+             if (unlikely(len < iph->ihl * 4))
+                     return -EINVAL;
+     } else if (iph->version == 6) {
+             ipv4 = false;
+             if (unlikely(len < sizeof(struct ipv6hdr)))
+                     return -EINVAL;
+     } else {
+             return -EINVAL;
+     }
+
+     if (ingress)
+             err = skb_cow_head(skb, len + skb->mac_len);
+     else
+             err = skb_cow_head(skb,
+                                len + LL_RESERVED_SPACE(skb_dst(skb)->dev));
+     if (unlikely(err))
+             return err;
+
+     /* push the encap headers and fix pointers */
+     skb_reset_inner_headers(skb);
+     skb->encapsulation = 1;
+     skb_push(skb, len);
+     if (ingress)
+             skb_postpush_rcsum(skb, iph, len);
+     skb_reset_network_header(skb);
+     memcpy(skb_network_header(skb), hdr, len);
+     bpf_compute_data_pointers(skb);

Does this work transparently with GSO as well or would we need to
update shared info for this (like in nat64 case, for example)?

Good point. It does need to update the gso_type to include the tunnel
type, similar to iptunnel_handle_offloads.

Only, the nice feature of this interface is that it is encap protocol
independent, which implies that it does not know the correct type.

I don't think that we want to allow programs to write the gso_type themselves.

With GSO_PARTIAL, perhaps specifying the exact tunnel type can be
avoided as long as it is a fixed prefix to replicate?

The transport layer size does not change, so no need to recompute gso_segs?

Either way, this seems non-trivial enough to me to do in a separate
follow-on patch. For now just fail if skb_is_gso.

Thanks, Willem, for the details! I'll re-send the patchset with the
additional check that skb_is_gso() is false.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help