Thread (45 messages) 45 messages, 8 authors, 2024-10-08

Re: [RFC bpf-next 0/4] Add XDP rx hw hints support performing XDP_REDIRECT

From: Stanislav Fomichev <hidden>
Date: 2024-09-27 01:43:11
Also in: bpf, intel-wired-lan

On 09/26, Lorenzo Bianconi wrote:
quoted
Lorenzo Bianconi [off-list ref] writes:
quoted
quoted
I'm hinting at some complications here (with the EFAULT return) that
needs to be resolved: there is no guarantee that a given packet will be
in sync with the current status of the registered metadata, so we need
explicit checks for this. If metadata entries are de-registered again
this also means dealing with holes and/or reshuffling the metadata
layout to reuse the released space (incidentally, this is the one place
where a TLV format would have advantages).
I like this approach but it seems to me more suitable for 'sw' metadata
(this is main Arthur and Jakub use case iiuc) where the userspace would
enable/disable these functionalities system-wide.
Regarding device hw metadata (e.g. checksum offload) I can see some issues
since on a system we can have multiple NICs with different capabilities.
If we consider current codebase, stmmac driver supports only rx timestamp,
while mlx5 supports all of them. In a theoretical system with these two
NICs, since pkt_metadata_registry is global system-wide, we will end-up
with quite a lot of holes for the stmmac, right? (I am not sure if this
case is relevant or not). In other words, we will end-up with a fixed
struct for device rx hw metadata (like xdp_rx_meta). So I am wondering
if we really need all this complexity for xdp rx hw metadata?
Well, the "holes" will be there anyway (in your static struct approach).
They would just correspond to parts of the "struct xdp_rx_meta" being
unset.
yes, what I wanted to say is I have the feeling we will end up 90% of the
times in the same fields architecture and the cases where we can save some
space seem very limited. Anyway, I am fine to discuss about a common
architecture.
quoted
What the "userspace can turn off the fields system wide" would
accomplish is to *avoid* the holes if you know that you will never need
them. E.g., say a system administrator know that they have no networks
that use (offloaded) VLANs. They could then disable the vlan offload
field system-wide, and thus reclaim the four bytes taken up by the
"vlan" member of struct xdp_rx_meta, freeing that up for use by
application metadata.
Even if I like the idea of having a common approach for this kernel feature,
hw metadata seems to me quite a corner case with respect of 'user-defined
metadata', since:
- I do not think it is a common scenario to disable hw offload capabilities
  (e.g checksum offload in production)
- I guess it is not just enough to disable them via bpf, but the user/sysadmin
  will need even to configure the NIC via ethtool (so a 2-steps process).

I think we should pay attention to not overcomplicate something that is 99%
enabled and just need to be fast. E.g I can see an issue of putting the hw rx
metadata in metadata field since metadata grows backward and we will probably
end up putting them in a different cacheline with respect to xdp_frame
(xdp_headroom is usually 256B).
quoted
However, it may well be that the complexity of allowing fields to be
turned off is not worth the gains. At least as long as there are only
the couple of HW metadata fields that we have currently. Having the
flexibility to change our minds later would be good, though, which is
mostly a matter of making the API exposed to BPF and/or userspace
flexible enough to allow us to move things around in memory in the
future. Which was basically my thought with the API I sketched out in
the previous email. I.e., you could go:

ret = bpf_get_packet_metadata_field(pkt, METADATA_ID_HW_HASH,
                                    &my_hash_vlaue, sizeof(u32))
yes, my plan is to add dedicated bpf kfuncs to store hw metadata in
xdp_frame/xdp_buff
quoted

...and the METADATA_ID_HW_HASH would be a constant defined by the
kernel, for which the bpf_get_packet_metadata_field() kfunc just has a
hardcoded lookup into struct xdp_rx_meta. And then, if we decide to move
the field in the future, we just change the kfunc implementation, with
no impact to the BPF programs calling it.
maybe we can use what we Stanislav have already added (maybe removing xdp
prefix):

enum xdp_rx_metadata {
	XDP_METADATA_KFUNC_RX_TIMESTAMP,
	XDP_METADATA_KFUNC_RX_HASH,
	XDP_METADATA_KFUNC_RX_VLAN_TAG
};
I think it was one of the ideas floating around back then for the
xdp->skb case (including redirection): have an extra kfunc that the bpf
program can call and make this kfunc store the metadata (in the metadata area)
in some fixed format. Then the kernel can consume it.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help