Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs

From: Alexei Starovoitov <hidden>
Date: 2016-09-19 20:13:27
Also in: netdev

Possibly related (same subject, not in this thread)

2016-09-26 · Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup ebpf egress programs · Daniel Borkmann <daniel@iogearbox.net>
2016-09-23 · Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup ebpf egress programs · Pablo Neira Ayuso <hidden>
2016-09-22 · Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs · Daniel Mack <hidden>
2016-09-22 · Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs · Daniel Borkmann <daniel@iogearbox.net>
2016-09-22 · Re: [PATCH v6 5/6] net: ipv4, ipv6: run cgroup eBPF egress programs · Pablo Neira Ayuso <hidden>

On Mon, Sep 19, 2016 at 09:19:10PM +0200, Pablo Neira Ayuso wrote:

On Mon, Sep 19, 2016 at 06:44:00PM +0200, Daniel Mack wrote:

quoted

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 6001e78..5dc90aa 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c

@@ -39,6 +39,7 @@
 #include <linux/module.h>
 #include <linux/slab.h>
 
+#include <linux/bpf-cgroup.h>
 #include <linux/netfilter.h>
 #include <linux/netfilter_ipv6.h>

@@ -143,6 +144,7 @@ int ip6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 {
 	struct net_device *dev = skb_dst(skb)->dev;
 	struct inet6_dev *idev = ip6_dst_idev(skb_dst(skb));
+	int ret;
 
 	if (unlikely(idev->cnf.disable_ipv6)) {
 		IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS);

@@ -150,6 +152,12 @@ int ip6_output(struct net *net, struct sock *sk, struct sk_buff *skb)
 		return 0;
 	}
 
+	ret = cgroup_bpf_run_filter(sk, skb, BPF_CGROUP_INET_EGRESS);
+	if (ret) {
+		kfree_skb(skb);
+		return ret;
+	}

1) If your goal is to filter packets, why so late? The sooner you
   enforce your policy, the less cycles you waste.

Actually, did you look at Google's approach to this problem?  They
want to control this at socket level, so you restrict what the process
can actually bind. That is enforcing the policy way before you even
send packets. On top of that, what they submitted is infrastructured
so any process with CAP_NET_ADMIN can access that policy that is being
applied and fetch a readable policy through kernel interface.

2) This will turn the stack into a nightmare to debug I predict. If
   any process with CAP_NET_ADMIN can potentially attach bpf blobs
   via these hooks, we will have to include in the network stack

a process without CAP_NET_ADMIN can attach bpf blobs to
system calls via seccomp. bpf is already used for security and policing.

   traveling documentation something like: "Probably you have to check
   that your orchestrator is not dropping your packets for some
   reason". So I wonder how users will debug this and how the policy that
   your orchestrator applies will be exposed to userspace.

as far as bpf debuggability/visibility there are various efforts on the way:
for kernel side:
- ksym for jit-ed programs
- hash sum for prog code
- compact type information for maps and various pretty printers
- data flow analysis of the programs
for user space:
- from bpf asm reconstruct the program in the high level language
  (there is p4 to bpf, this effort is about bpf to p4)

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help