Re: [PATCH net-next 2/2] tun: allow to attach ebpf socket filter

From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: 2017-12-31 10:15:02
Also in: lkml

On Fri, Dec 29, 2017 at 3:44 AM, Jason Wang [off-list ref] wrote:

This patch allows userspace to attach eBPF filter to tun. This will
allow to implement VM dataplane filtering in a more efficient way
compared to cBPF filter.

Is the idea to allow the trusted hypervisor to install these programs,
or the untrusted guests?

eBPF privilege escalations like those recently described in
https://lwn.net/Articles/742170/ would give me pause to expose
this to guests.

quoted hunk ↗ jump to hunk

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tun.c           | 26 ++++++++++++++++++++++++++
 include/uapi/linux/if_tun.h |  1 +
 2 files changed, 27 insertions(+)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 0853829..6e9452b 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c

@@ -238,6 +238,7 @@ struct tun_struct {
        struct tun_pcpu_stats __percpu *pcpu_stats;
        struct bpf_prog __rcu *xdp_prog;
        struct tun_prog __rcu *steering_prog;
+       struct tun_prog __rcu *filter_prog;
 };

 static int tun_napi_receive(struct napi_struct *napi, int budget)

@@ -984,12 +985,25 @@ static void tun_automq_xmit(struct tun_struct *tun, struct sk_buff *skb)
 #endif
 }

+static unsigned int run_ebpf_filter(struct tun_struct *tun,
+                                   struct sk_buff *skb,
+                                   int len)
+{
+       struct tun_prog *prog = rcu_dereference(tun->filter_prog);
+
+       if (prog)
+               len = bpf_prog_run_clear_cb(prog->prog, skb);
+
+       return len;
+}
+
 /* Net device start xmit */
 static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 {
        struct tun_struct *tun = netdev_priv(dev);
        int txq = skb->queue_mapping;
        struct tun_file *tfile;
+       int len = skb->len;

        rcu_read_lock();
        tfile = rcu_dereference(tun->tfiles[txq]);

@@ -1015,9 +1029,16 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
            sk_filter(tfile->socket.sk, skb))
                goto drop;

+       len = run_ebpf_filter(tun, skb, len);
+       if (!len)
+               goto drop;
+

This adds a second filter step independent of the sk_filter call above.
Perhaps the two filter interfaces can map onto to the same instance.
I imagine that qemu never programs SO_ATTACH_FILTER.

More importantly, should this program just return a boolean pass or
drop. Taking a length and trimming may introduce bugs later on if the
stack parses the packet unconditionally, expecting a minimum size
to be present.

This was the reason for introducing sk_filter_trim_cap and using that
in other sk_filter sites.

A quick scan shows that tun_put_user expects a full vlan tag to exist
if skb_vlan_tag_present(skb), for instance. If trimmed to below this
length the final call to skb_copy_datagram_iter may have negative
length.

This is an issue with the existing sk_filter call as much as with the
new run_ebpf_filter call.

        if (unlikely(skb_orphan_frags_rx(skb, GFP_ATOMIC)))
                goto drop;

+       if (pskb_trim(skb, len))
+               goto drop;
+
        skb_tx_timestamp(skb);

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help