Thread (6 messages) 6 messages, 3 authors, 2026-02-20

Re: [PATCH net v4 2/2] xsk: Fix zero-copy AF_XDP fragment drop

From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date: 2026-02-20 12:37:43

On Thu, Feb 19, 2026 at 02:55:29PM -0800, Jakub Kicinski wrote:
On Tue, 17 Feb 2026 21:08:51 +0000 Nikhil P. Rao wrote:
quoted
AF_XDP should ensure that only a complete packet is sent to application.
In the zero-copy case, if the Rx queue gets full as fragments are being
enqueued, the remaining fragments are dropped.

For the multi-buffer case, add a check to ensure that the Rx queue has
enough space for all fragments of a packet before starting to enqueue
them.

Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX")
Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Nikhil P. Rao <redacted>
---
 net/xdp/xsk.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index f2ec4f78bbb6..f7f816a5cb80 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -167,25 +167,32 @@ static int xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len)
 	struct xdp_buff_xsk *pos, *tmp;
 	struct list_head *xskb_list;
 	u32 contd = 0;
+	u32 num_desc;
 	int err;
 
-	if (frags)
+	if (frags) {
+		num_desc = xdp_get_shared_info_from_buff(xdp)->nr_frags + 1;
 		contd = XDP_PKT_CONTD;
[1]
quoted
+	} else {
+		err = __xsk_rcv_zc(xs, xskb, len, contd);
+		if (err)
+			goto err;
+		return 0;
+	}
 
-	err = __xsk_rcv_zc(xs, xskb, len, contd);
-	if (err)
+	if (xskq_prod_nb_free(xs->rx, num_desc) < num_desc) {
We can pull this check into the branch at [1]
It will let us preserve the existing flow.
Hi Jakub,

that would work, yes.
Either that or handle the non-frag case fully upfront:

if (likely(!frags)) {
	err = __xsk_rcv_zc(xs, xskb, len, 0);
	if (err)
		goto err;
	return 0;
}

As is you have a weird mix of the two.
quoted
+		xs->rx_queue_full++;
+		err = -ENOBUFS;
 		goto err;
-	if (likely(!frags))
-		return 0;
+	}
 
+	__xsk_rcv_zc(xs, xskb, len, contd);
Personal preference perhaps but removing error checking always
gives me pause. Maybe:

	bool frag_fail;

	frag_fail = __xsk_rcv_zc(xs, xskb, len, contd);
	list_for_each...
		...
		frag_fail |= __xsk_rcv_zc(xs, xskb, len, contd);
	DEBUG_NET_WARN_ON_ONCE(frag_fail);
error checking can be actually skipped as xskq_prod_nb_free() peeked into
xsk rx queue and told us there is enough space for descriptor production.

I have sent a patch that adds a variant of __xsk_rcv_zc() that skips
xskq_prod_reserve_desc():

https://lore.kernel.org/bpf/20260218150000.301176-1-maciej.fijalkowski@intel.com/ (local)

Logistics of these patches (this set & patch linked above) are a bit of a
question to me though since what Nikhil sent are clearly a fixes that need
backports whereas mine was sent as an improvement towards -next tree.
However, path that Nikhil touched here should be adjusted to what my patch
introduces. I might do this as a follow-up once bpf is merged to bpf-next.

Nikhil, I also see you routed the set to 'net' tree, previously xsk core
was handled via bpf/bpf-next.
?
quoted
 	xskb_list = &xskb->pool->xskb_list;
 	list_for_each_entry_safe(pos, tmp, xskb_list, list_node) {
 		if (list_is_singular(xskb_list))
 			contd = 0;
 		len = pos->xdp.data_end - pos->xdp.data;
-		err = __xsk_rcv_zc(xs, pos, len, contd);
-		if (err)
-			goto err;
+		__xsk_rcv_zc(xs, pos, len, contd);
 		list_del_init(&pos->list_node);
 	}
 
-- 
pw-bot: cr
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help