Re: [PATCH bpf-next V2 0/7] xdp: Allow BPF to set RX hints for XDP_REDIRECTed packets
From: Jesper Dangaard Brouer <hawk@kernel.org>
Date: 2025-07-29 11:15:59
Also in:
bpf
On 28/07/2025 18.29, Jakub Kicinski wrote:
On Mon, 28 Jul 2025 12:53:01 +0200 Lorenzo Bianconi wrote:quoted
quoted
quoted
I can see why you might think that, but from my perspective, the xdp_frame *is* the implementation of the mini-SKB concept. We've been building it incrementally for years. It started as the most minimal structure possible and has gradually gained more context (e.g. dev_rx, mem_info/rxq_info, flags, and also uses skb_shared_info with same layout as SKB).My understanding was that just adding all the fields to xdp_frame was considered too wasteful. Otherwise we would have done something along those lines ~10 years ago :SHi Jakub, sorry for the late reply.
Same, back from vacation.
quoted
I am completely fine to redesign the solution to overcome the problem but I guess this feature will allow us to improve XDP performance in a common/real use-case. Let's consider we want to redirect a packet into a veth and then into a container. Preserving the hw metadata performing XDP_REDIRECT will allow us to avoid recalculating the checksum creating the skb. This will result in a very nice performance improvement. So I guess we should really come up with some idea to add this missing feature.Martin mentioned to me that he had proposed in the past that we allow allocating the skb at the XDP level, if the program needs "skb-level metadata". That actually seems pretty clean to me.. Was it ever explored?
That idea has been considered before, but it unfortunately doesn't work from a performance angle. The performance model of XDP_REDIRECT into CPUMAP relies on moving the expensive SKB allocation+init to a remote CPU. This keeps the ingress CPU free to process packets at near line rate (our DDoS use-case). If we allocate the SKB on the ingress-CPU before the redirect, we destroy this load-balancing model and create the exact bottleneck we designed CPUMAP to avoid. To bring the focus back to the specific problem this series solves, let's review the concrete use case. Our IPsec scenario is a key example: on the ingress CPU, an XDP program calculates a hash from inner packet headers to load-balance traffic via CPUMAP. When the packet arrives on the remote CPU, this hash is lost, so the new SKB is created with a hash of zero. This, in turn, causes poor load-balancing when the packet is forwarded to a multi-queue device like veth, as traffic often collapses to a single queue. The purpose of this patchset is simply to provide a standard way to carry that hash to the remote CPU within the xdp_frame. (Same goes for a standard way to carry VLAN tags) Given this specific problem, is there a better approach to solving it than what this patchset proposes? --Jesper