Re: [RFC PATCH 0/5] Add eBPF hooks for cgroups

From: Daniel Mack <daniel@zonque.org>
Date: 2016-08-19 10:35:18

Hi Pablo,

On 08/19/2016 11:19 AM, Pablo Neira Ayuso wrote:

On Wed, Aug 17, 2016 at 04:00:43PM +0200, Daniel Mack wrote:

quoted

I'd appreciate some feedback on this. Pablo has some remaining concerns
about this approach, and I'd like to continue the discussion we had
off-list in the light of this patchset.

OK, I'm going to summarize them here below:

* This new hook

"This" refers to your alternative to my patch set, right?

allows us to enforce an *administrative filtering
  policy* that must be visible to anyone with CAP_NET_ADMIN. This is
  easy to display in nf_tables as you can list the ruleset via the nft
  userspace tool. Otherwise, in your approach if a misconfigured
  filtering policy causes connectivity problems, I don't see how the
  sysadmin is going to have an easy way to troubleshoot what is going on.

True. That's the downside of bpf.

* Interaction with other software. As I could read from your patch,
  what you propose will detach any previous existing filter. So I
  don't see how you can attach multiple filtering policies from
  different processes that don't cooperate each other.

Also true. A cgroup can currently only hold one bpf program for each
direction, and they are supposed to be set from one controlling instance
in the system. However, it is possible to create subcgroups, and install
own programs in them, which will then be effective instead of the one in
the parent. They will, however, replace each other in runtime behavior,
and not be stacked. This is a fundamentally different approach than how
nf_tables works of course.

In nf_tables
  this is easy since they can create their own tables so they keep their
  ruleset in separate spaces. If the interaction is not OK, again the
  sysadmin can very quickly debug this since the policies would be
  visible via nf_tables ruleset listing.

True. Debugging would be much easier that way.

So what I'm proposing goes in the direction of using the nf_tables
infrastructure instead:

* Add a new socket family for nf_tables with an input hook at
  sk_filter(). This just requires the new netfilter hook there and
  the boiler plate code to allow creating tables for this new family.
  And then we get access to many of the existing features in
  nf_tables for free.

Yes. However, when I proposed more or less exactly that back in
September last year ("NF_INET_LOCAL_SOCKET_IN"), the concern raised by
you and Florian Westphal was that this type of decision making is out of
scope for netfilter, mostly because

a) whether a userspace process is running should not have any influence
in the netfilter behavior (which it does, because the rules are not
processed when the local socket is cannot be determined)

b) it is asymmetric, as it only exists for the input path

c) it's a change in netfilter paradigm, because rules for multicast
receivers are run multiple times (once for each receiving task)

d) it was considered a sledgehammer solution for a something that very
few people really need


I still think such a hook would be a good thing to have. As far as
implementation goes, my patch set back then patched each of the
protocols individually (ipv4, ipv6, dccp, sctp), while your idea to hook
in to sk_filter sound much more reasonable.

If the opinions on the previously raised concerns have changed, I'm
happy to revisit.

* We can quickly find a verdict on the packet using using any combination
  of selectors through concatenations and maps in nf_tables. In
  nf_tables we can express the policy with a non-linear ruleset.

That's another interesting detail that was discussed on NFWS, yes. We
need a way to dispatch incoming packets without walking a linear
dispatcher list. In the eBPF approach, that's very easy because the
cgroup is directly associated with the receiving socket, so the lookup
of the effective eBPF programs is really fast.

If we can achieve similar things with nf_tables and maps, then that
should be applicable as well.

On
  top of this, by delaying the nf_reset() calls we can reach the
  conntrack information from sk_filter(). That would be useful to skip
  evaluating packets that belong to already established flows. Thus, we
  incur the performance penalty in classifying only for the first
  packet of the flow.

If that's possible, that's an interesting feature, but at least for
accounting, we need to run the rules for all packets, always.

* We can skip the socket egress hook (that you don't know where to place
  yet) since you can use the existing local output hook in netfilter that
  is available for IPv4 and IPv6.

If asymmetry is not a no-go anymore, that sounds fine to me.

* This new hook would fit into the existing netfilter set of hooks,
  the sysadmin is already familiarized with the administrative
  infrastructure to define filtering policies in our stack, so adding this
  new hook to what we have looks natural to me.

At least for inspecting the rules, this is certainly a benefit. On the
other hand, it's always been a pain to handle competing entities in the
system that both alter netfilter configurations, as ownership of rules
is suddenly not clear anymore.

Another concern I have with cgroup matching in netfilter (at least as
enforced by cgroup v2) is that every such rule has to carry a
char[PATH_MAX] struct member, and the matching is done via that path
string. I guess we need to come up with some solution in that area
that's less expensive here, but that could be solved separately.

So - I don't know. The whole 'eBPF in cgroups' idea was born because
through the discussions over the past months we had on all this, it
became clear to me that netfilter is not the right place for filtering
on local tasks. I agree the solution I am proposing in my patch set has
its downsides, mostly when it comes to transparency to users, but I
considered that acceptable. After all, we have eBPF users all over the
place in the kernel already, and seccomp, for instance, isn't any better
in that regard.

That said, if there is a better solution for the problem, I can as well
ditch my patches. It's ultimately your call anyway I guess :) Do you
have any plans on working on this new netfilter hook or do you want me
to have look?


Thanks,
Daniel

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help