Thread (50 messages) 50 messages, 8 authors, 2020-03-10

Re: [PATCH bpf-next 0/3] Introduce pinnable bpf_link kernel abstraction

From: Jakub Kicinski <kuba@kernel.org>
Date: 2020-03-05 08:16:23
Also in: bpf

On Wed, 4 Mar 2020 17:07:08 -0800, Alexei Starovoitov wrote:
quoted
Maybe also the thief should not have CAP_ADMIN in the first place?
And ask a daemon to perform its actions..  
a daemon idea keeps coming back in circles.
With FD-based kprobe/uprobe/tracepoint/fexit/fentry that problem is gone,
but xdp, tc, cgroup still don't have the owner concept.
Some people argued that these three need three separate daemons.
Especially since cgroups are mainly managed by systemd plus container
manager it's quite different from networking (xdp, tc) where something
like 'networkd' might makes sense.
But if you take this line of thought all the ways systemd should be that
single daemon to coordinate attaching to xdp, tc, cgroup because
in many cases cgroup and tc progs have to coordinate the work.
The feature creep could happen, but Toke's proposal has a fairly simple
feature set, which should be easy to cover by a stand alone daemon.

Toke, I saw that in the library discussion there was no mention of 
a daemon, what makes a daemon solution unsuitable?
At that's where it's getting gloomy... unless the kernel can provide
a facility so central daemon is not necessary.
quoted
quoted
current xdp, tc, cgroup apis don't have the concept of the link
and owner of that link.  
Why do the attachment points have to have a concept of an owner and 
not the program itself?  
bpf program is an object. That object has an owner or multiple owners.
A user process that holds a pointer to that object is a shared owner.
FD is such pointer. FD == std::shared_ptr<bpf_prog>.
Holding that pointer guarantees that <bpf_prog> will not disappear,
but it says nothing that the program will keep running.
For [ku]probe,tp,fentry,fexit there was always <bpf_link> in the kernel.
It wasn't that formal in the past until most recent Andrii's patches,
but the concept existed for long time. FD == std::shared_ptr<bpf_link>
connects a kernel object with <bpf_prog>. When that kernel objects emits
an event the <bpf_link> guarantees that <bpf_prog> will be executed.
I see so the link is sort of [owner -> prog -> target].
For cgroups we don't have such concept. We thought that three attach modes we
introduced (default, allow-override, allow-multi) will cover all use cases. But
in practice turned out that it only works when there is a central daemon for
_all_ cgroup-bpf progs in the system otherwise different processes step on each
other. More so there has to be a central diff-review human authority otherwise
teams step on each other. That's sort-of works within one org, but doesn't
scale.

To avoid making systemd a central place to coordinate attaching xdp, tc, cgroup
progs the kernel has to provide a mechanism for an application to connect a
kernel object with a prog and hold the ownership of that link so that no other
process in the system can break that connection. 
To me for XDP the promise that nothing breaks the connection cannot be
made without a daemon, because without the daemon the link has to be
available somewhere/pinned to make changes to, and therefore is no
longer safe. (Lock but with a key right next to it, in the previous
analogies.)

And daemon IMHO can just monitor the changes. No different how we would
monitor for applications fiddling with any other networking state,
addresses, routes, device config, you name it. XDP changes already fire
link change notification, that's there probably from day one.
That kernel object is cgroup,
qdisc, netdev. Interesting question comes when that object disappears. What to
do with the link? Two ways to solve it:
1. make link hold the object, so it cannot be removed.
2. destroy the link when object goes away.
Both have pros and cons as I mentioned earlier. And that's what's to be decided.
I think the truth is somewhat in the middle. The link has to hold the object,
so it doesn't disappear from under it, but get notified on deletion, so the
link can be self destroyed. From the user point of view the execution guarantee
is still preserved. The kernel object was removed and the link has one dangling
side. Note this behavior is vastly different from existing xdp, tc, cgroup
behavior where both object and bpf prog can be alive, but connection is gone
and execution guarantee is broken.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help