Re: [PATCH bpf-next V2 0/7] xdp: Allow BPF to set RX hints for XDP_REDIRECTed packets
From: Jesper Dangaard Brouer <hawk@kernel.org>
Date: 2025-07-31 16:27:13
Also in:
bpf
On 29/07/2025 21.47, Martin KaFai Lau wrote:
On 7/29/25 4:15 AM, Jesper Dangaard Brouer wrote:quoted
That idea has been considered before, but it unfortunately doesn't work from a performance angle. The performance model of XDP_REDIRECT into CPUMAP relies on moving the expensive SKB allocation+init to a remote CPU. This keeps the ingress CPU free to process packets at near line rate (our DDoS use-case). If we allocate the SKB on the ingress-CPU before the redirect, we destroy this load-balancing model and create the exact bottleneck we designed CPUMAP to avoid.iirc, a xdp prog can be attached to a cpumap. The skb can be created by that xdp prog running on the remote cpu. It should be like a xdp prog returning a XDP_PASS + an optional skb. The xdp prog can set some fields in the skb. Other than setting fields in the skb, something else may be also possible in the future, e.g. look up sk, earlier demux ...etc.
I have strong reservations about having the BPF program itself trigger the SKB allocation. I believe this would fundamentally break the performance model that makes cpumap redirect so effective. The key to XDP's high performance lies in processing a bulk of xdp_frames in a tight loop to amortize costs. The existing cpumap code on the remote CPU is already highly optimized for this: it performs bulk allocation of SKBs and uses careful prefetching to hide the memory latency. Allowing a BPF program to sometimes trigger a heavyweight SKB alloc+init (4 cache-line misses) would bypass all these existing optimizations. It would introduce significant jitter into the pipeline and disrupt the entire bulk-processing model we rely on for performance. This performance is not just theoretical; we rely on it for DDoS protection. For example, our plan is to use the XDP program on the cpumap hook to run secondary DDoS mitigation rules that currently use iptables (funny, many rules are actually BPF program snippets today). Architecturally, there is a clean separation today: the BPF program makes a decision, and the highly-optimized cpumap or core kernel code acts on it (build_skb, napi_gro_receive, etc). Your proposal blurs this line significantly. Our patch, in contrast, preserves this model. It simply provides the necessary data (the hash, vlan and timestamp) to the existing cpumap/veth skb path via the xdp_frame. While more advanced capabilities are an interesting topic for the future, my goal here is to solve the immediate, concrete problem of transferring metadata cleanly, without disrupting the performance architecture we rely on for use cases like DDoS mitigation. --Jesper