Thread (120 messages) 120 messages, 12 authors, 2020-05-13

Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP

From: Andrii Nakryiko <hidden>
Date: 2020-03-26 19:06:28
Also in: bpf

On Thu, Mar 26, 2020 at 5:35 AM Toke Høiland-Jørgensen [off-list ref] wrote:
Andrii Nakryiko [off-list ref] writes:
quoted
Now for XDP. It has same flawed model. And even if it seems to you
that it's not a big issue, and even if Jakub thinks we are trying to
solve non-existing problem, it is a real problem and a real concern
from people that have to support XDP in production with many
well-meaning developers developing BPF applications independently.
Copying what you wrote in another thread:
quoted
Setting aside the question of which is the best abstraction to represent
an attachment, it seems to me that the actual behavioural problem (XDP
programs being overridden by mistake) would be solvable by this patch,
assuming well-behaved userspace applications.
... this is a horrible and unrealistic assumption that we just cannot
make and accept. However well-behaved userspace applications are, they
are written by people that make mistakes. And rather than blissfully
expect that everything will be fine, we want to have enforcements in
place that will prevent some buggy application to wreck havoc in
production.
Look, I'm not trying to tell you how to managed your internal systems.
I'm just objecting to your assertion that your deployment model is the
only one that can possibly work, and the refusal to consider other
alternatives that comes with it.
Your assumption doesn't work for us. Because of that we need something
like bpf_link. Existing attachment API doesn't go away and is still
supported. Feel free to use existing API. As for EXPECTED_FD API you
are adding, it will be up to maintainers to decide, ultimately, I
can't block it, even if I wanted to.
quoted
quoted
You're saying that like we didn't already have the netlink API. We
essentially already have (the equivalent of) LINK_CREATE and LINK_QUERY,
this is just adding LINK_UPDATE. It's a straight-forward fix of an
existing API; essentially you're saying we should keep the old API in a
crippled state in order to promote your (proposed) new API.
This is the fundamental disagreement that we seem to have. XDP's BPF
program attachment is not in any way equivalent to bpf_link. So no,
netlink API currently doesn't have anything that's close to bpf_link.
Let me try to summarize what bpf_link is and what are its fundamental
properties regardless of type of BPF programs.
First of all, thank you for this summary; that is very useful!
Sure, you're welcome.
quoted
1. bpf_link represents a connection (pairing?) of BPF program and some
BPF hook it is attached to. BPF hook could be perf event, cgroup,
netdev, etc. It's a completely independent object in itself, along the
bpf_map and bpf_prog, which has its own lifetime and kernel
representation. To user-space application it is returned as an
installed FD, similar to loaded BPF program and BPF map. It is
important that it's not just a BPF program, because BPF program can be
attached to multiple BPF hooks (e.g., same XDP program can be attached
to multiple interface; same kprobe handler can be installed multiple
times), which means that having BPF program FD isn't enough to
uniquely represent that one specific BPF program attachment and detach
it or query it. Having kernel object for this allows to encapsulate
all these various details of what is attached were and present to
user-space a single handle (FD) to work with.
For XDP there is already a unique handle, it's just implicit: Each
netdev can have exactly one XDP program loaded. So I don't really see
how bpf_link adds anything, other than another API for the same thing?
I certainly failed to explain things clearly if you are still asking
this. See point #2, once you attach bpf_link you can't just replace
it. This is what XDP doesn't have right now.

It's a game of picking features/properties in isolation and "we can do
this particular thing this different way with what we have". Please,
try consider all of it together, it's important. Every single aspect
of bpf_link is not that unique, but it's all of them together that
matter.
quoted
2. Due to having FD associated with bpf_link, it's not possible to
talk about "owning" bpf_link. If application created link and never
shared its FD with any other application, it is the sole owner of it.
But it also means that you can share it, if you need it. Now, once
application closes FD or app crashes and kernel automatically closes
that FD, bpf_link refcount is decremented. If it was the last or only
FD, it will trigger automatica detachment and clean up of that
particular BPF program attachment. Note, not a clean up of BPF
program, which can still be attached somewhere else: only that
particular attachment.
This behaviour is actually one of my reservations against bpf_link for
XDP: I think that automatically detaching XDP programs when the FD is
closed is very much the wrong behaviour. An XDP program processes
packets, and when loading one I very much expect it to keep doing that
until I explicitly tell it to stop.
As you mentioned earlier, "it's not the only one mode". Just like with
tracing APIs, you can imagine scripts that would adds their
packet-sniffing XDP program temporarily. If they crash, "temporarily"
turns into "permanently, but no one knows". This is bad. And again,
it's a choice, just with a default to auto-cleanup, because it's safe,
even if it requires extra step for applications willing to do
permanent XDP attachment.
quoted
3. This derives from the concept of ownership of bpf_link. Once
bpf_link is attached, no other application that doesn't own that
bpf_link can replace, detach or modify the link. For some cases it
doesn't matter. E.g., for tracing, all attachment to the same fentry
trampoline are completely independent. But for other cases this is
crucial property. E.g., when you attach BPF program in an exclusive
(single) mode, it means that particular cgroup and any of its children
cgroups can have any more BPF programs attached. This is important for
container management systems to enforce invariants and correct
functioning of the system. Right now it's very easy to violate that -
you just go and attach your own BPF program, and previous BPF program
gets automatically detached without original application that put it
there knowing about this. Chaos ensues after that and real people have
to deal with this. Which is why existing
BPF_PROG_ATTACH/BPF_PROG_DETACH API is inadequate and we are adding
bpf_link support.
I can totally see how having an option to enforce a policy such as
locking out others from installing cgroup BPF programs is useful. But
such an option is just that: policy. So building this policy in as a
fundamental property of the API seems like a bad idea; that is
effectively enforcing policy in the kernel, isn't it?
I hope we won't go into a dictionary definition of what "policy" means
here :). For me it's about guarantee that kernel gives to user-space.
bpf_link doesn't care about dictating policies. If you don't want this
guarantee - don't use bpf_link, use direct program attachment. As
simple as that. Policy is implemented by user-space application by
using APIs with just the right guarantees.
quoted
Those same folks have similar concern with XDP. In the world where
container management installs "root" XDP program which other user
applications can plug into (libxdp use case, right?), it's crucial to
ensure that this root XDP program is not accidentally overwritten by
some well-meaning, but not overly cautious developer experimenting in
his own container with XDP programs. This is where bpf_link ownership
plays a huge role. Tupperware agent (FB's container management agent)
would install root XDP program and will hold onto this bpf_link
without sharing it with other applications. That will guarantee that
the system will be stable and can't be compromised.
See this is where we get into "deployment-model specific territory". I
mean, sure, in the "central management daemon" model, it makes sense
that no other applications can replace the XDP program. But, erm, we
already have a mechanism to ensure that: Just don't grant those
applications CAP_NET_ADMIN? So again, bpf_link doesn't really seem to
add anything other than a different way to do the same thing?
Because there are still applications that need CAP_NET_ADMIN in order
to function (for other reasons than attaching XDP), so it's impossible
to enforce with for everyone.
Additionally, in the case where there is *not* a central management
daemon (i.e., what I'm implementing with libxdp), this would be the flow
implemented by the library without bpf_link:

1. Query kernel for current BPF prog loaded on $IFACE
2. Sanity-check that this program is a dispatcher program installed by
   libxdp
3. Create a new dispatcher program with whatever changes we want to do
   (such as adding another component program).
4. Atomically replace the old program with the new one using the netlink
   API in this patch series.

Whereas with bpf_link, it would be:

1. Find the pinned bpf_link for $IFACE (e.g., load from
   /sys/fs/bpf/iface-links/$IFNAME).
But now you can hide this mount point from containerized
root/CAP_NET_ADMIN application, can't you? See the difference? One
might think about bpf_link as a fine-grained capability in this sense.

2. Query kernel for current BPF prog linked to $LINK
3. Sanity-check that this program is a dispatcher program installed by
   libxdp
4. Create a new dispatcher program with whatever changes we want to do
   (such as adding another component program).
5. Atomically replace the old program with the new one using the
   LINK_UPDATE bpf() API.


So all this does is add an additional step, and another dependency on
bpffs. And crucially, I really don't see how the "bpf_link is the only
thing that is not fundamentally broken" argument holds up.

-Toke
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help