Re: [PATCH net-next v3 3/4] bpf: BPF for lightweight tunnel infrastructure

From: Alexei Starovoitov <hidden>
Date: 2016-11-30 00:15:14

On Tue, Nov 29, 2016 at 02:21:22PM +0100, Thomas Graf wrote:

Registers new BPF program types which correspond to the LWT hooks:
  - BPF_PROG_TYPE_LWT_IN   => dst_input()
  - BPF_PROG_TYPE_LWT_OUT  => dst_output()
  - BPF_PROG_TYPE_LWT_XMIT => lwtunnel_xmit()

The separate program types are required to differentiate between the
capabilities each LWT hook allows:

 * Programs attached to dst_input() or dst_output() are restricted and
   may only read the data of an skb. This prevent modification and
   possible invalidation of already validated packet headers on receive
   and the construction of illegal headers while the IP headers are
   still being assembled.

 * Programs attached to lwtunnel_xmit() are allowed to modify packet
   content as well as prepending an L2 header via a newly introduced
   helper bpf_skb_push(). This is safe as lwtunnel_xmit() is invoked
   after the IP header has been assembled completely.

All BPF programs receive an skb with L3 headers attached and may return
one of the following error codes:

 BPF_OK - Continue routing as per nexthop
 BPF_DROP - Drop skb and return EPERM
 BPF_REDIRECT - Redirect skb to device as per redirect() helper.
                (Only valid in lwtunnel_xmit() context)

The return codes are binary compatible with their TC_ACT_
relatives to ease compatibility.

Signed-off-by: Thomas Graf <tgraf@suug.ch>

...

+#define LWT_BPF_MAX_HEADROOM 128

why 128?
btw I'm thinking for XDP to use 256, so metadata can be stored in there.

+static int run_lwt_bpf(struct sk_buff *skb, struct bpf_lwt_prog *lwt,
+		       struct dst_entry *dst, bool can_redirect)
+{
+	int ret;
+
+	/* Preempt disable is needed to protect per-cpu redirect_info between
+	 * BPF prog and skb_do_redirect(). The call_rcu in bpf_prog_put() and
+	 * access to maps strictly require a rcu_read_lock() for protection,
+	 * mixing with BH RCU lock doesn't work.
+	 */
+	preempt_disable();
+	rcu_read_lock();
+	bpf_compute_data_end(skb);
+	ret = BPF_PROG_RUN(lwt->prog, skb);
+	rcu_read_unlock();
+
+	switch (ret) {
+	case BPF_OK:
+		break;
+
+	case BPF_REDIRECT:
+		if (!can_redirect) {
+			WARN_ONCE(1, "Illegal redirect return code in prog %s\n",
+				  lwt->name ? : "<unknown>");
+			ret = BPF_OK;
+		} else {
+			ret = skb_do_redirect(skb);

I think this assumes that program did bpf_skb_push and L2 header is present.
Would it make sense to check that mac_header < network_header here to make
sure that it actually happened? I think the cost of single 'if' isn't much.
Also skb_do_redirect() can redirect to l3 tunnels like ipip ;)
so program shouldn't be doing bpf_skb_push in such case...
May be rename bpf_skb_push to bpf_skb_push_l2 ?
since it's doing skb_reset_mac_header(skb); at the end of it?
Or it's probably better to use 'flags' argument to tell whether
bpf_skb_push() should set mac_header or not ? Then this bit:

+		case BPF_OK:
+			/* If the L3 header was expanded, headroom might be too
+			 * small for L2 header now, expand as needed.
+			 */
+			ret = xmit_check_hhlen(skb);

will work fine as well...
which probably needs "mac_header wasn't set" check? or it's fine?

All bpf bits look great. Thanks!

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help