Thread (120 messages) 120 messages, 12 authors, 2020-05-13

Re: [PATCH bpf-next 1/4] xdp: Support specifying expected existing program when attaching XDP

From: Toke Høiland-Jørgensen <hidden>
Date: 2020-03-25 09:38:41
Also in: bpf

Andrii Nakryiko [off-list ref] writes:
On Tue, Mar 24, 2020 at 3:57 AM Toke Høiland-Jørgensen [off-list ref] wrote:
quoted
Andrii Nakryiko [off-list ref] writes:
quoted
On Mon, Mar 23, 2020 at 12:23 PM Toke Høiland-Jørgensen [off-list ref] wrote:
quoted
Andrii Nakryiko [off-list ref] writes:
quoted
On Mon, Mar 23, 2020 at 4:24 AM Toke Høiland-Jørgensen [off-list ref] wrote:
quoted
Andrii Nakryiko [off-list ref] writes:
quoted
On Fri, Mar 20, 2020 at 11:31 AM John Fastabend
[off-list ref] wrote:
quoted
Jakub Kicinski wrote:
quoted
On Fri, 20 Mar 2020 09:48:10 +0100 Toke Høiland-Jørgensen wrote:
quoted
Jakub Kicinski [off-list ref] writes:
quoted
On Thu, 19 Mar 2020 14:13:13 +0100 Toke Høiland-Jørgensen wrote:
quoted
From: Toke Høiland-Jørgensen <redacted>

While it is currently possible for userspace to specify that an existing
XDP program should not be replaced when attaching to an interface, there is
no mechanism to safely replace a specific XDP program with another.

This patch adds a new netlink attribute, IFLA_XDP_EXPECTED_FD, which can be
set along with IFLA_XDP_FD. If set, the kernel will check that the program
currently loaded on the interface matches the expected one, and fail the
operation if it does not. This corresponds to a 'cmpxchg' memory operation.

A new companion flag, XDP_FLAGS_EXPECT_FD, is also added to explicitly
request checking of the EXPECTED_FD attribute. This is needed for userspace
to discover whether the kernel supports the new attribute.

Signed-off-by: Toke Høiland-Jørgensen <redacted>
I didn't know we wanted to go ahead with this...
Well, I'm aware of the bpf_link discussion, obviously. Not sure what's
happening with that, though. So since this is a straight-forward
extension of the existing API, that doesn't carry a high implementation
cost, I figured I'd just go ahead with this. Doesn't mean we can't have
something similar in bpf_link as well, of course.
I'm not really in the loop, but from what I overheard - I think the
bpf_link may be targeting something non-networking first.
My preference is to avoid building two different APIs one for XDP and another
for everything else. If we have userlands that already understand links and
pinning support is on the way imo lets use these APIs for networking as well.
I agree here. And yes, I've been working on extending bpf_link into
cgroup and then to XDP. We are still discussing some cgroup-specific
details, but the patch is ready. I'm going to post it as an RFC to get
the discussion started, before we do this for XDP.
Well, my reason for being skeptic about bpf_link and proposing the
netlink-based API is actually exactly this, but in reverse: With
bpf_link we will be in the situation that everything related to a netdev
is configured over netlink *except* XDP.
One can argue that everything related to use of BPF is going to be
uniform and done through BPF syscall? Given variety of possible BPF
hooks/targets, using custom ways to attach for all those many cases is
really bad as well, so having a unifying concept and single entry to
do this is good, no?
Well, it depends on how you view the BPF subsystem's relation to the
rest of the kernel, I suppose. I tend to view it as a subsystem that
provides a bunch of functionality, which you can setup (using "internal"
BPF APIs), and then attach that object to a different subsystem
(networking) using that subsystem's configuration APIs.

Seeing as this really boils down to a matter of taste, though, I'm not
sure we'll find agreement on this :)
Yeah, seems like so. But then again, your view and reality don't seem
to correlate completely. cgroup, a lot of tracing,
flow_dissector/lirc_mode2 attachments all are done through BPF
syscall.
Well, I wasn't talking about any of those subsystems, I was talking
about networking :)
So it's not "BPF subsystem's relation to the rest of the kernel" from
your previous email, it's now only "talking about networking"? Since
when the rest of the kernel is networking?
Not really, I would likely argue the same for any other subsystem, I
just prefer to limit myself to talking about things I actually know
something about. Hence, networking :)
But anyways, I think John addressed modern XDP networking issues in
his email very well already.
Going to reply to that email next...
quoted
In particular, networking already has a consistent and fairly
well-designed configuration mechanism (i.e., netlink) that we are
generally trying to move more functionality *towards* not *away from*
(see, e.g., converting ethtool to use netlink).
quoted
LINK_CREATE provides an opportunity to finally unify all those
different ways to achieve the same "attach my BPF program to some
target object" semantics.
Well I also happen to think that "attach a BPF program to an object" is
the wrong way to think about XDP. Rather, in my mind the model is
"instruct the netdevice to execute this piece of BPF code".
That can't be reconciled, so no point of arguing :) But thinking about
BPF in general, I think it's closer to attach BPF program thinking
(especially all the fexit/fentry, kprobe, etc), where objects that BPF
is attached to is not "active" in the sense of "calling BPF", it's
more of BPF system setting things up (attaching?) in such a way that
BPF program is executed when appropriate.
I'd tend to agree with you on most of the tracing stuff, but not on
this. But let's just agree to disagree here :)
quoted
quoted
quoted
quoted
quoted
Other than that, I don't see any reason why the bpf_link API won't work.
So I guess that if no one else has any problem with BPF insisting on
being a special snowflake, I guess I can live with it as well... *shrugs* :)
Apart from derogatory remark,
Yeah, should have left out the 'snowflake' bit, sorry about that...
quoted
BPF is a bit special here, because it requires every potential BPF
hook (be it cgroups, xdp, perf_event, etc) to be aware of BPF
program(s) and execute them with special macro. So like it or not, it
is special and each driver supporting BPF needs to implement this BPF
wiring.
All that is about internal implementation, though. I'm bothered by the
API discrepancy (i.e., from the user PoV we'll end up with: "netlink is
what you use to configure your netdev except if you want to attach an
XDP program to it").
See my reply to David. Depends on where you define user API. Is it
libbpf API, which is what most users are using? Or kernel API?
Well I'm talking about the kernel<->userspace API, obviously :)
quoted
If everyone is using libbpf, does kernel system (bpf syscall vs
netlink) matter all that much?
This argument works the other way as well, though: If libbpf can
abstract the subsystem differences and provide a consistent interface to
"the BPF world", why does BPF need to impose its own syscall API on the
networking subsystem?
bpf_link in libbpf started as user-space abstraction only, but we
realized that it's not enough and there is a need to have proper
kernel support and corresponding kernel object, so it's not just
user-space API concerns.

As for having netlink interface for creating link only for XDP. Why
duplicating and maintaining 2 interfaces?
Totally agree; why do we need two interfaces? Let's keep the one we
already have - the netlink interface! :)
All the other subsystems will go through bpf syscall, only XDP wants
to (also) have this through netlink. This means duplication of UAPI
for no added benefit. It's a LINK_CREATE operations, as well as
LINK_UPDATE operations. Do we need to duplicate LINK_QUERY (once its
implemented)? What if we'd like to support some other generic bpf_link
functionality, would it be ok to add it only to bpf syscall, or we
need to duplicate this in netlink as well?
You're saying that like we didn't already have the netlink API. We
essentially already have (the equivalent of) LINK_CREATE and LINK_QUERY,
this is just adding LINK_UPDATE. It's a straight-forward fix of an
existing API; essentially you're saying we should keep the old API in a
crippled state in order to promote your (proposed) new API.

-Toke
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help