Re: [PATCH net v4 2/2] xsk: Fix zero-copy AF_XDP fragment drop
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date: 2026-02-20 12:37:43
On Thu, Feb 19, 2026 at 02:55:29PM -0800, Jakub Kicinski wrote:
On Tue, 17 Feb 2026 21:08:51 +0000 Nikhil P. Rao wrote:quoted
AF_XDP should ensure that only a complete packet is sent to application. In the zero-copy case, if the Rx queue gets full as fragments are being enqueued, the remaining fragments are dropped. For the multi-buffer case, add a check to ensure that the Rx queue has enough space for all fragments of a packet before starting to enqueue them. Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX") Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Nikhil P. Rao <redacted> --- net/xdp/xsk.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-)diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index f2ec4f78bbb6..f7f816a5cb80 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c@@ -167,25 +167,32 @@ static int xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len) struct xdp_buff_xsk *pos, *tmp; struct list_head *xskb_list; u32 contd = 0; + u32 num_desc; int err; - if (frags) + if (frags) { + num_desc = xdp_get_shared_info_from_buff(xdp)->nr_frags + 1; contd = XDP_PKT_CONTD;[1]quoted
+ } else { + err = __xsk_rcv_zc(xs, xskb, len, contd); + if (err) + goto err; + return 0; + } - err = __xsk_rcv_zc(xs, xskb, len, contd); - if (err) + if (xskq_prod_nb_free(xs->rx, num_desc) < num_desc) {We can pull this check into the branch at [1] It will let us preserve the existing flow.
Hi Jakub, that would work, yes.
Either that or handle the non-frag case fully upfront: if (likely(!frags)) { err = __xsk_rcv_zc(xs, xskb, len, 0); if (err) goto err; return 0; } As is you have a weird mix of the two.quoted
+ xs->rx_queue_full++; + err = -ENOBUFS; goto err; - if (likely(!frags)) - return 0; + } + __xsk_rcv_zc(xs, xskb, len, contd);Personal preference perhaps but removing error checking always gives me pause. Maybe: bool frag_fail; frag_fail = __xsk_rcv_zc(xs, xskb, len, contd); list_for_each... ... frag_fail |= __xsk_rcv_zc(xs, xskb, len, contd); DEBUG_NET_WARN_ON_ONCE(frag_fail);
error checking can be actually skipped as xskq_prod_nb_free() peeked into xsk rx queue and told us there is enough space for descriptor production. I have sent a patch that adds a variant of __xsk_rcv_zc() that skips xskq_prod_reserve_desc(): https://lore.kernel.org/bpf/20260218150000.301176-1-maciej.fijalkowski@intel.com/ (local) Logistics of these patches (this set & patch linked above) are a bit of a question to me though since what Nikhil sent are clearly a fixes that need backports whereas mine was sent as an improvement towards -next tree. However, path that Nikhil touched here should be adjusted to what my patch introduces. I might do this as a follow-up once bpf is merged to bpf-next. Nikhil, I also see you routed the set to 'net' tree, previously xsk core was handled via bpf/bpf-next.
?quoted
xskb_list = &xskb->pool->xskb_list; list_for_each_entry_safe(pos, tmp, xskb_list, list_node) { if (list_is_singular(xskb_list)) contd = 0; len = pos->xdp.data_end - pos->xdp.data; - err = __xsk_rcv_zc(xs, pos, len, contd); - if (err) - goto err; + __xsk_rcv_zc(xs, pos, len, contd); list_del_init(&pos->list_node); }-- pw-bot: cr