Thread (98 messages) 98 messages, 14 authors, 2024-08-21

Re: [xdp-hints] Re: [PATCH RFC bpf-next 00/52] bpf, xdp: introduce and use Generic Hints/metadata

From: Toke Høiland-Jørgensen <hidden>
Date: 2022-06-29 13:43:32
Also in: bpf, lkml

John Fastabend [off-list ref] writes:
Alexander Lobakin wrote:
quoted
This RFC is to give the whole picture. It will most likely be split
onto several series, maybe even merge cycles. See the "table of
contents" below.
Even for RFC its a bit much. Probably improve the summary
message here as well I'm still not clear on the overall
architecture so not sure I want to dig into patches.
+1 on this, and piggybacking on your comment to chime in on the general
architecture.
quoted
Now, a NIC driver, or even a SmartNIC itself, can put those params
there in a well-defined format. The format is fixed, but can be of
several different types represented by structures, which definitions
are available to the kernel, BPF programs and the userland.
I don't think in general the format needs to be fixed.
No, that's the whole point of BTF: it's not supposed to be UAPI, we'll
use CO-RE to enable dynamic formats...

[...]
quoted
It is fixed due to it being almost a UAPI, and the exact format can
be determined by reading the last 10 bytes of metadata. They contain
a 2-byte magic ID to not confuse it with a non-compatible meta and
a 8-byte combined BTF ID + type ID: the ID of the BTF where this
structure is defined and the ID of that definition inside that BTF.
Users can obtain BTF IDs by structure types using helpers available
in the kernel, BPF (written by the CO-RE/verifier) and the userland
(libbpf -> kernel call) and then rely on those ID when reading data
to make sure whether they support it and what to do with it.
Why separate magic and ID? The idea is to make different formats
always contain the basic/"generic" structure embedded at the end.
This way we can still benefit in purely generic consumers (like
cpumap) while providing some "extra" data to those who support it.
I don't follow this. If you have a struct in your driver name it
something obvious, ice_xdp_metadata. If I understand things
correctly just dump the BTF for the driver, extract the
struct and done you can use CO-RE reads. For the 'fixed' case
this looks easy. And I don't think you even need a patch for this.
...however as we've discussed previously, we do need a bit of
infrastructure around this. In particular, we need to embed the embed
the BTF ID into the metadata itself so BPF can do runtime disambiguation
between different formats (and add the right CO-RE primitives to make
this easy). This is for two reasons:

- The metadata might be different per-packet (e.g., PTP packets with
  timestamps interleaved with bulk data without them)

- With redirects we may end up processing packets from different devices
  in a single XDP program (in devmap or cpumap, or on a veth) so we need
  to be able to disambiguate at runtime.

So I think the part of the design that puts the BTF ID into the end of
the metadata struct is sound; however, the actual format doesn't have to
be fixed, we can use CO-RE to pick out the bits that a given BPF program
needs; we just need a convention for how drivers report which format(s)
they support. Which we should also agree on (and add core infrastructure
around) so each driver doesn't go around inventing their own
conventions.
quoted
The enablement of this feature is controlled on attaching/replacing
XDP program on an interface with two new parameters: that combined
BTF+type ID and metadata threshold.
The threshold specifies the minimum frame size which a driver (or
NIC) should start composing metadata from. It is introduced instead
of just false/true flag due to that often it's not worth it to spend
cycles to fetch all that data for such small frames: let's say, it
can be even faster to just calculate checksums for them on CPU
rather than touch non-coherent DMA zone. Simple XDP_DROP case loses
15 Mpps on 64 byte frames with enabled metadata, threshold can help
mitigate that.
I would put this in the bonus category. Can you do the simple thing
above without these extra bits and then add them later. Just
pick some overly conservative threshold to start with.
Yeah, I'd agree this kind of configuration is something that can be
added later, and also it's sort of orthogonal to the consumption of the
metadata itself.

Also, tying this configuration into the loading of an XDP program is a
terrible interface: these are hardware configuration options, let's just
put them into ethtool or 'ip link' like any other piece of device
configuration.
quoted
The RFC can be divided into 8 parts:
I'm missing something why not do the simplest bit of work and
get this running in ice with a few smallish driver updates
so we can all see it. No need for so many patches.
Agreed. This incremental approach is basically what Jesper's
simultaneous series makes a start on, AFAICT? Would be nice if y'all
could converge the efforts :)

[...]
I really think your asking questions that are two or three
jumps away. Why not do the simplest bit first and kick
the driver with an on/off switch into this mode. But
I don't understand this cpumap use case so maybe explain
that first.

And sorry didn't even look at your 50+ patches. Figure lets
get agreement on the goal first.
+1 on both of these :)

-Toke
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help