Re: Re: [PATCH bpf v2] xsk: Fix out of order segment free in __xsk_generic_xmit()
From: Stanislav Fomichev <hidden>
Date: 2025-06-02 15:28:53
Also in:
lkml
Possibly related (same subject, not in this thread)
- 2025-06-11 · Re: Re: Re: Re: Re: [PATCH bpf v2] xsk: Fix out of order segment free in __xsk_generic_xmit() · Maciej Fijalkowski <maciej.fijalkowski@intel.com>
- 2025-06-10 · RE: Re: Re: Re: Re: [PATCH bpf v2] xsk: Fix out of order segment free in __xsk_generic_xmit() · Eryk Kubanski <hidden>
On 06/02, Eryk Kubanski wrote:
quoted
I'm not sure I understand what's the issue here. If you're using the same XSK from different CPUs, you should take care of the ordering yourself on the userspace side?It's not a problem with user-space Completion Queue READER side. Im talking exclusively about kernel-space Completion Queue WRITE side. This problem can occur when multiple sockets are bound to the same umem, device, queue id. In this situation Completion Queue is shared. This means it can be accessed by multiple threads on kernel-side. Any use is indeed protected by spinlock, however any write sequence (Acquire write slot as writer, write to slot, submit write slot to reader) isn't atomic in any way and it's possible to submit not-yet-sent packet descriptors back to user-space as TX completed. Up untill now, all write-back operations had two phases, each phase locks the spinlock and unlocks it: 1) Acquire slot + Write descriptor (increase cached-writer by N + write values) 2) Submit slot to the reader (increase writer by N) Slot submission was solely based on the timing. Let's consider situation, where two different threads issue a syscall for two different AF_XDP sockets that are bound to the same umem, dev, queue-id. AF_XDP setup: kernel-space Write Read +--+ +--+ | | | | | | | | | | | | Completion | | | | Fill Queue | | | | Queue | | | | | | | | | | | | | | | | +--+ +--+ Read Write user-space +--------+ +--------+ | AF_XDP | | AF_XDP | +--------+ +--------+ Possible out-of-order scenario: writer cached_writer1 cached_writer2 | | | | | | | | | | | | +--------------|--------|--------|--------|--------|--------|--------|----------------------------------------------+ | | | | | | | | | Completion Queue | | | | | | | | | | | | | | | | | | +--------------|--------|--------|--------|--------|--------|--------|----------------------------------------------+ | | | | | | |-----------------| | A) T1 syscall | | writes 2 | | descriptors |-----------------------------------| B) T2 syscall writes 4 descriptors Notes: 1) T1 and T2 AF_XDP sockets are two different sockets, __xsk_generic_xmit will obtain two different mutexes. 2) T1 and T2 can be executed simultaneously, there is no critical section whatsoever between them.
XSK represents a single queue and each queue is single producer single consumer. The fact that you can dup a socket and call sendmsg from different threads/processes does not lift that restriction. I think if you add synchronization on the userspace (lock(); sendmsg(); unlock();), that should help, right?