Thread (50 messages) 50 messages, 11 authors, 2022-11-03

Re: [RFC bpf-next 0/5] xdp: hints via kfuncs

From: John Fastabend <john.fastabend@gmail.com>
Date: 2022-10-28 23:16:23
Also in: bpf

Stanislav Fomichev wrote:
On Fri, Oct 28, 2022 at 11:05 AM Jakub Kicinski [off-list ref] wrote:
quoted
On Fri, 28 Oct 2022 08:58:18 -0700 John Fastabend wrote:
quoted
A bit of extra commentary. By exposing the raw kptr to the rx
descriptor we don't need driver writers to do anything.
And can easily support all the drivers out the gate with simple
one or two line changes. This pushes the interesting parts
into userspace and then BPF writers get to do the work without
bother driver folks and also if its not done today it doesn't
matter because user space can come along and make it work
later. So no scattered kernel dependencies which I really
would like to avoid here. Its actually very painful to have
to support clusters with N kernels and M devices if they
have different features. Doable but annoying and much nicer
if we just say 6.2 has support for kptr rx descriptor reading
and all XDP drivers support it. So timestamp, rxhash work
across the board.
IMHO that's a bit of wishful thinking. Driver support is just a small
piece, you'll have different HW and FW versions, feature conflicts etc.
In the end kernel version is just one variable and there are many others
you'll already have to track.
Agree.
quoted
And it's actually harder to abstract away inter HW generation
differences if the user space code has to handle all of it.
I don't see how its any harder in practice though?
I've had the same concern:

Until we have some userspace library that abstracts all these details,
it's not really convenient to use. IIUC, with a kptr, I'd get a blob
of data and I need to go through the code and see what particular type
it represents for my particular device and how the data I need is
represented there. There are also these "if this is device v1 -> use
v1 descriptor format; if it's a v2->use this another struct; etc"
complexities that we'll be pushing onto the users. With kfuncs, we put
this burden on the driver developers, but I agree that the drawback
here is that we actually have to wait for the implementations to catch
up.
I agree with everything there, you will get a blob of data and then
will need to know what field you want to read using BTF. But, we
already do this for BPF programs all over the place so its not a big
lift for us. All other BPF tracing/observability requires the same
logic. I think users of BPF in general perhaps XDP/tc are the only
place left to write BPF programs without thinking about BTF and
kernel data structures.

But, with proposed kptr the complexity lives in userspace and can be
fixed, added, updated without having to bother with kernel updates, etc.
From my point of view of supporting Cilium its a win and much preferred
to having to deal with driver owners on all cloud vendors, distributions,
and so on.

If vendor updates firmware with new fields I get those immediately.
Jakub mentions FW and I haven't even thought about that; so yeah, bpf
programs might have to take a lot of other state into consideration
when parsing the descriptors; all those details do seem like they
belong to the driver code.
I would prefer to avoid being stuck on requiring driver writers to
be involved. With just a kptr I can support the device and any
firwmare versions without requiring help.
Feel free to send it early with just a handful of drivers implemented;
I'm more interested about bpf/af_xdp/user api story; if we have some
nice sample/test case that shows how the metadata can be used, that
might push us closer to the agreement on the best way to proceed.
I'll try to do a intel and mlx implementation to get a cross section.
I have a good collection of nics here so should be able to show a
couple firmware versions. It could be fine I think to have the raw
kptr access and then also kfuncs for some things perhaps.


quoted
quoted
To find the offset of fields (rxhash, timestamp) you can use
standard BTF relocations we have all this machinery built up
already for all the other structs we read, net_devices, task
structs, inodes, ... so its not a big hurdle at all IMO. We
can add userspace libs if folks really care, but its just a read so
I'm not even sure that is helpful.

I think its nicer than having kfuncs that need to be written
everywhere. My $.02 although I'll poke around with below
some as well. Feel free to just hang tight until I have some
code at the moment I have intel, mellanox drivers that I
would want to support.
I'd prefer if we left the door open for new vendors. Punting descriptor
parsing to user space will indeed result in what you just said - major
vendors are supported and that's it.
I'm not sure about why it would make it harder for new vendors? I think
the opposite, it would be easier because I don't need vendor support
at all. Thinking it over seems there could be room for both.


Thanks!
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help