Thread (25 messages) 25 messages, 3 authors, 2026-05-26

Re: [PATCH net v4 0/5] xsk: fix meta and publish of cq issues

From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date: 2026-05-22 18:34:08
Also in: bpf

On Fri, May 22, 2026 at 09:48:39PM +0800, Jason Xing wrote:
On Fri, May 22, 2026 at 4:55 PM Jason Xing [off-list ref] wrote:
quoted
On Thu, May 21, 2026 at 10:24 PM Maciej Fijalkowski
[off-list ref] wrote:
quoted
On Thu, May 21, 2026 at 09:07:30PM +0800, Jason Xing wrote:
quoted
On Thu, May 21, 2026 at 9:00 PM Maciej Fijalkowski
[off-list ref] wrote:
quoted
On Thu, May 21, 2026 at 08:41:08PM +0800, Jason Xing wrote:
quoted
On Thu, May 21, 2026 at 8:24 PM Maciej Fijalkowski
[off-list ref] wrote:
quoted
On Wed, May 20, 2026 at 08:42:39AM +0800, Jason Xing wrote:
quoted
From: Jason Xing <kernelxing@tencent.com>

The series is the product of previous review from sashiko[1].

1) META
patch 1: address TOCTOU around metadata.

2) PUBLISH of CQ
patch 2: make sure xsk_addr->addrs[] can be published to cq when
         overflow occurs.
patch 3: keep cleaning up the continuation descs (more than 17) and
         publish its address when overflow occurs.
patch 4: like patch 3, but only handles the invalid descs cases.

[1]: https://lore.kernel.org/all/20260502200722.53960-1-kerneljasonxing@gmail.com/ (local)

---
V4
Link: https://lore.kernel.org/all/20260517063311.28921-1-kerneljasonxing@gmail.com/ (local)
1. correct the description of xmit path in patch 3 (sashiko)
2. move set logic into xmit path in patch 3 (Stan)

V3
Link: https://lore.kernel.org/all/20260515123018.80147-1-kerneljasonxing@gmail.com/ (local)
1. avoid breaking previous usage of sendto, and siliently handle
overflow case (Stan, sashiko)
2. add one particular exception process in patch 4 (sashiko)
3. adjust the selftest to make sure it passes in either virutal or
physical machines, which includes add usleep to support physical machine.

V2
Link: https://lore.kernel.org/all/20260510012310.88570-1-kerneljasonxing@gmail.com/ (local)
1. adjust selftests (Jakub)
2. add READ_ONCE in patch 1 (Stan)
FWIW I still get test failures (yes with patch 5 applied). PTAL.
Thanks for the test. But I've tried with ixgbe driver...

I noticed there are some flaky tests which have nothing to do with the
series. Can you confirm that it's not caused because of the series?
That explains the different results as i am using i40e/ice which have
multi-buffer support whereas ixgbe does not even support mbuf at XDP.
Broken tests are from mbuf cases.
That's weird. I never expected the failed tests to be about multi-buffer.

Are they the same as the output you attached last time? Or something
new? Could you please share it so that I can investigate the root
cause?
# [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
# DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01#
DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00#
DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....#
.... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 #
---------------------------------------
not ok 21 FAIL: SKB ALIGNED_INV_DESC_MULTI_BUFF
# [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
# DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01#
DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00#
DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....#
.... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 #
---------------------------------------
not ok 33 FAIL: SKB UNALIGNED_INV_DESC_MULTI_BUFF
# [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
# DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01#
DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00#
DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....#
.... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 #
---------------------------------------
not ok 57 FAIL: DRV ALIGNED_INV_DESC_MULTI_BUFF
# [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
# DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01#
DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00#
DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....#
.... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 #
---------------------------------------
not ok 69 FAIL: DRV UNALIGNED_INV_DESC_MULTI_BUFF
# [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
# DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01#
DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00#
DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....#
.... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 #
---------------------------------------
not ok 93 FAIL: ZC ALIGNED_INV_DESC_MULTI_BUFF
# [is_frag_valid] expected pkt_nb [11], got pkt_nb [0]
# DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01#
DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00#
DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....#
---------------------------------------
not ok 94 FAIL: ZC TOO_MANY_FRAGS
# [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
# DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01#
DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00#
DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....#
.... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 #
---------------------------------------
not ok 105 FAIL: ZC UNALIGNED_INV_DESC_MULTI_BUFF

# 4 skipped test(s) detected. Consider enabling relevant config options to improve coverage.
# Totals: pass:96 fail:8 xfail:0 xpass:0 skip:4 error:0
XSK_SELFTESTS_ens259f1np1_SOFTIRQ: [ FAIL ]
1..108
# [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
# DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01#
DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00#
DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....#
.... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 #
---------------------------------------
not ok 21 FAIL: SKB BUSY-POLL ALIGNED_INV_DESC_MULTI_BUFF
# [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
# DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01#
DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00#
DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....#
.... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 #
---------------------------------------
not ok 33 FAIL: SKB BUSY-POLL UNALIGNED_INV_DESC_MULTI_BUFF
# [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
# DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01#
DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00#
DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....#
.... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 #
---------------------------------------
not ok 57 FAIL: DRV BUSY-POLL ALIGNED_INV_DESC_MULTI_BUFF
# [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
# DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01#
DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00#
DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....#
.... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 #
---------------------------------------
not ok 69 FAIL: DRV BUSY-POLL UNALIGNED_INV_DESC_MULTI_BUFF
# [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
# DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01#
DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00#
DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....#
.... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 #
---------------------------------------
not ok 93 FAIL: ZC BUSY-POLL ALIGNED_INV_DESC_MULTI_BUFF
# [is_frag_valid] expected pkt_nb [11], got pkt_nb [0]
# DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01#
DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00#
DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....#
---------------------------------------
# [is_frag_valid] expected pkt_nb [10], got pkt_nb [0]
# DEBUG>> L2: dst mac: # 55# 44# 33# 22# 11# 01#
DEBUG>> L2: src mac: # 55# 44# 33# 22# 11# 00#
DEBUG>> L5: seqnum: # 0:0 # 0:1 # 0:2 # 0:3 # 0:4 # 0:5 # 0:6 # 0:7 # 0:8 # 0:9 # 0:10 # 0:11 # 0:0 # 0:0 # 0:0 # 0:0 # ....#
.... # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 # 0:0 #
---------------------------------------
not ok 105 FAIL: ZC BUSY-POLL UNALIGNED_INV_DESC_MULTI_BUFF

# 4 skipped test(s) detected. Consider enabling relevant config options to improve coverage.
# Totals: pass:96 fail:8 xfail:0 xpass:0 skip:4 error:0
XSK_SELFTESTS_ens259f1np1_BUSY_POLL: [ FAIL ]

Summary:
XSK_SELFTESTS_ens259f1np1_SOFTIRQ: [ FAIL ]
XSK_SELFTESTS_ens259f1np1_BUSY_POLL: [ FAIL ]
Sorry, Maciej. I managed to get one server with i40e nic but still
couldn't reproduce it. Can you try the attachment (that is the
replacement for v4-0005) instead? I removed those nasty CONT test
cases...
Ah, I think I eventually figured out a solution. Maciej, could you
please test the 2nd patch instead?

This patch reworks the CONTD test cases. Cross finger.
Please don't rush things here, I believe we need to think a bit more here.
I have second thoughts about overall approach.

My understanding wrt CQ was that it is a container that holds descriptors
which have been successfully transmitted. Now we want to add also leftover
descriptors from broken packets, which might confuse user space sides in
case they were relying on behavior described above.

The intent is right of course as we don't want to lose UMEM descs, but I
feel like we need a separate mechanism for that rather than putting
invalid descs to CQ.

Does it make sense?

Besides, even though we would stay with proposed changes, behavior between
modes should be aligned. Right now ZC seems to be broken in touched
regions here - when we hit the limit of frags via pool->xdp_zc_max_segs,
we break the loop and discard the packet, never post it to CQ and these
descs are lost from user space POV. Then we would continue on next call
and interpret the rest of too big packet as a separate one (clamped) and
therefore submit corrupted packet to HW.

I'll be looking at ZC API but i do think we need a common approach,
mode-agnostic.

Thanks,
Maciej
Thanks,
Jason
quoted
Really I don't think I have much time to spend on these tests which
makes me feel extremely annoyed... It's not easy to analyze the code
without a reproducer. The good news is that now I highly suspect that
this kind of CONT test cases pollute the whole cq which affects other
tests. Before I give up on the 0003/0004 patches, I'd like to hear
some advice from you. Thank you.

My original intention was to push batch xmit forward but at that time
sashiko pointed out some unrelated bugs accidentally.

Thanks,
Jason
quoted
quoted
Thanks,
Jason
quoted
quoted
Thanks,
Jason

quoted
quoted

Jason Xing (5):
  xsk: cache csum_start/csum_offset to fix TOCTOU in xsk_skb_metadata()
  xsk: fix buffer leak in xsk_drop_skb() for AF_XDP multi-buffer Tx
  xsk: drain continuation descs after overflow in xsk_build_skb()
  xsk: drain continuation descs on invalid descriptor in
    __xsk_generic_xmit()
  selftests/xsk: drain CQ to wait for TX completion

 include/net/xdp_sock.h                        |  1 +
 net/xdp/xsk.c                                 | 44 +++++++++++++----
 .../selftests/bpf/prog_tests/test_xsk.c       | 48 +++++++++++--------
 3 files changed, 63 insertions(+), 30 deletions(-)

--
2.43.7
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help