Re: [RFC PATCH 4/5] net: filter: run cgroup eBPF programs
From: Alexei Starovoitov <hidden>
Date: 2016-08-17 18:23:34
On Wed, Aug 17, 2016 at 11:20:29AM -0700, Alexei Starovoitov wrote:
On Wed, Aug 17, 2016 at 04:00:47PM +0200, Daniel Mack wrote:quoted
If CONFIG_CGROUP_BPF is enabled, and the cgroup associated with the receiving socket has an eBPF programs installed, run them from sk_filter_trim_cap(). eBPF programs used in this context are expected to either return 1 to let the packet pass, or != 1 to drop them. The programs have access to the full skb, including the MAC headers. This patch only implements the call site for ingress packets. Signed-off-by: Daniel Mack <daniel@zonque.org> --- net/core/filter.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+)diff --git a/net/core/filter.c b/net/core/filter.c index c5d8332..a1dd94b 100644 --- a/net/core/filter.c +++ b/net/core/filter.c@@ -52,6 +52,44 @@ #include <net/dst.h> #include <net/sock_reuseport.h> +#ifdef CONFIG_CGROUP_BPF +static int sk_filter_cgroup_bpf(struct sock *sk, struct sk_buff *skb, + enum bpf_attach_type type) +{ + struct sock_cgroup_data *skcd = &sk->sk_cgrp_data; + struct cgroup *cgrp = sock_cgroup_ptr(skcd); + struct bpf_prog *prog; + int ret = 0; + + rcu_read_lock(); + + switch (type) { + case BPF_ATTACH_TYPE_CGROUP_EGRESS: + prog = rcu_dereference(cgrp->bpf_egress); + break; + case BPF_ATTACH_TYPE_CGROUP_INGRESS: + prog = rcu_dereference(cgrp->bpf_ingress); + break; + default: + WARN_ON_ONCE(1); + ret = -EINVAL; + break; + } + + if (prog) {I really like how in this version of the patches it became a single load+cmp of per-packet cost when this feature is off. Please move + struct cgroup *cgrp = sock_cgroup_ptr(skcd); into if (prog) {..} to make sure it's actually single load. The compiler cannot avoid that load when it's placed at the top.
sorry. brain fart. it is two loads. scratch that.
quoted
+ unsigned int offset = skb->data - skb_mac_header(skb); + + __skb_push(skb, offset); + ret = bpf_prog_run_clear_cb(prog, skb) > 0 ? 0 : -EPERM;that doesn't match commit log. The above '> 0' makes sense to me though. If we want to do it for 1 only we have to define it in uapi/bpf.h as action code, so we can extend to 2, 3 in the future if necessary. It also have to be bpf_prog_run_save_cb() (as sk_filter_trim_cap() does) instead of bpf_prog_run_clear_cb(). See commit ff936a04e5f2 ("bpf: fix cb access in socket filter programs")quoted
+ __skb_pull(skb, offset); + } + + rcu_read_unlock(); + + return ret; +} +#endif /* !CONFIG_CGROUP_BPF */ + /** * sk_filter_trim_cap - run a packet through a socket filter * @sk: sock associated with &sk_buff@@ -78,6 +116,12 @@ int sk_filter_trim_cap(struct sock *sk, struct sk_buff *skb, unsigned int cap) if (skb_pfmemalloc(skb) && !sock_flag(sk, SOCK_MEMALLOC)) return -ENOMEM; +#ifdef CONFIG_CGROUP_BPF + err = sk_filter_cgroup_bpf(sk, skb, BPF_ATTACH_TYPE_CGROUP_INGRESS); + if (err) + return err; +#endif + err = security_sock_rcv_skb(sk, skb); if (err) return err;-- 2.5.5