Re: [PATCH net-next RFC v1 2/4] veth: use generic-XDP functions when dealing with SKBs
From: Jesper Dangaard Brouer <hawk@kernel.org>
Date: 2023-08-29 14:37:32
On 24/08/2023 12.30, Toke Høiland-Jørgensen wrote:
Jesper Dangaard Brouer [off-list ref] writes:quoted
The root-cause the realloc issue is that veth_xdp_rcv_skb() code path (that handles SKBs like generic-XDP) is calling a native-XDP function xdp_do_redirect(), instead of simply using xdp_do_generic_redirect() that can handle SKBs. The existing code tries to steal the packet-data from the SKB (and frees the SKB itself). This cause issues as SKBs can have different memory models that are incompatible with native-XDP call xdp_do_redirect(). For this reason the checks in veth_convert_skb_to_xdp_buff() becomes more strict. This in turn makes this a bad approach. Simply leveraging generic-XDP helpers e.g. generic_xdp_tx() and xdp_do_generic_redirect() as this resolves the issue given netstack can handle these different SKB memory models.While this does solve the memory issue, it's also a subtle change of semantics. For one thing, generic_xdp_tx() has this comment above it: /* When doing generic XDP we have to bypass the qdisc layer and the * network taps in order to match in-driver-XDP behavior. This also means * that XDP packets are able to starve other packets going through a qdisc, * and DDOS attacks will be more effective. In-driver-XDP use dedicated TX * queues, so they do not have this starvation issue. */ Also, more generally, this means that if you have a setup with XDP_REDIRECT-based forwarding in on a host with a mix of physical and veth devices, all the traffic originating from the veth devices will go on different TXQs than that originating from a physical NIC. Or if a veth device has a mix of xdp_frame-backed packets and skb-backed packets, those will also go on different queues, potentially leading to reordering.
Mixing xdp_frame-backed packets and skb-backed packet (towards veth) will naturally come from two different data paths, and the BPF-developer that redirected the xdp_frame (into veth) will have taken this choice, including the chance of reordering (given the two data/code paths). I will claim that (for SKBs) current code cause reordering on TXQs (as you explain), and my code changes actually fix this problem. Consider a userspace app (inside namespace) sending packets out (to veth peer). Routing (or bridging) will make netstack send out device A (maybe a physical device). On veth peer we have XDP-prog running, that will XDP-redirect every 2nd packet to device A. With current code TXQ reordering will occur, as calling "native" xdp_do_redirect() will select TXQ based on current-running CPU, while normal SKBs will use netdev_core_pick_tx(). After my change, using xdp_do_generic_redirect(), the code end-up using generic_xdp_tx() which (looking at the code) also use netdev_core_pick_tx() to select the TXQ. Thus, I will claim it is more correct (even-though XDP in general doesn't give this guarantee).
I'm not sure exactly how much of an issue this is in practice, but at least from a conceptual PoV it's a change in behaviour that I don't think we should be making lightly. WDYT?
As desc above, I think this patchset is an improvement. It might even fix/address the concern that was raised. [Outside the scope of this patchset] The single XDP BPF-prog getting attached to (RX-side) on a veth device, actually needs to handle *both* xdp_frame-backed packets and SKB-backed packets, and it cannot tell them apart. (Easy fix: implement a kfunc RX-metadata hint to expose this?). For the use-case[1] of implementing NFV (Network Function Virt) chaining via veth device, where each veth-pairs XDP BPF-prog implement a network "function" and redirect/chain to the next veth/container NFV. For this use-case, I would like the ability to either skip SKB-backed packet or turn off BPF-prog seeing any SKB-backed packets. There is a huge performance advantage when XDP-redirecting an xdp_frame into veth devices in this way, approx 6Mpps for traversing 4 veth devices as benchmarked in [1]. (p.s. I was going to improve this performance further, but I got distracted by other work). [1] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp_frame03_overhead.org The veth-NFV like use-cases are hampered by the SKB-based XDP code-path causing a significant slowdown for normal netstack packets. Plus, it need to parse-and-filter those SKB-based packets too. This, patchset "just" significantly reduce the overhead of the SKB-based XDP code path, which IMHO is a good first step. Then we can discuss if should have a switch to turn off the SKB-based XDP code-path in veth, afterwards. --Jesper