Thread (16 messages) 16 messages, 5 authors, 2025-07-04

Re: Re: [PATCH bpf v2] xsk: Fix out of order segment free in __xsk_generic_xmit()

From: Stanislav Fomichev <hidden>
Date: 2025-06-02 15:28:53
Also in: lkml

Possibly related (same subject, not in this thread)

On 06/02, Eryk Kubanski wrote:
quoted
I'm not sure I understand what's the issue here. If you're using the
same XSK from different CPUs, you should take care of the ordering
yourself on the userspace side?
It's not a problem with user-space Completion Queue READER side.
Im talking exclusively about kernel-space Completion Queue WRITE side.

This problem can occur when multiple sockets are bound to the same
umem, device, queue id. In this situation Completion Queue is shared.
This means it can be accessed by multiple threads on kernel-side.
Any use is indeed protected by spinlock, however any write sequence
(Acquire write slot as writer, write to slot, submit write slot to reader)
isn't atomic in any way and it's possible to submit not-yet-sent packet
descriptors back to user-space as TX completed.

Up untill now, all write-back operations had two phases, each phase
locks the spinlock and unlocks it:
1) Acquire slot + Write descriptor (increase cached-writer by N + write values)
2) Submit slot to the reader (increase writer by N)

Slot submission was solely based on the timing. Let's consider situation,
where two different threads issue a syscall for two different AF_XDP sockets
that are bound to the same umem, dev, queue-id.

AF_XDP setup:
                                                            
                             kernel-space                   
                                                            
           Write   Read                                     
            +--+   +--+                                     
            |  |   |  |                                     
            |  |   |  |                                     
            |  |   |  |                                     
 Completion |  |   |  | Fill                                
 Queue      |  |   |  | Queue                               
            |  |   |  |                                     
            |  |   |  |                                     
            |  |   |  |                                     
            |  |   |  |                                     
            +--+   +--+                                     
            Read   Write                                    
                             user-space                     
                                                            
                                                            
   +--------+         +--------+                            
   | AF_XDP |         | AF_XDP |                            
   +--------+         +--------+                            
                                                            
                                                            
                                                            
                                                            

Possible out-of-order scenario:
                                                                                                                                       
                                                                                                                                       
                              writer         cached_writer1                      cached_writer2                                        
                                 |                 |                                   |                                               
                                 |                 |                                   |                                               
                                 |                 |                                   |                                               
                                 |                 |                                   |                                               
                  +--------------|--------|--------|--------|--------|--------|--------|----------------------------------------------+
                  |              |        |        |        |        |        |        |                                              |
 Completion Queue |              |        |        |        |        |        |        |                                              |
                  |              |        |        |        |        |        |        |                                              |
                  +--------------|--------|--------|--------|--------|--------|--------|----------------------------------------------+
                                 |                 |                                   |                                               
                                 |                 |                                   |                                               
                                 |-----------------|                                   |                                               
                                  A) T1 syscall    |                                   |                                               
                                  writes 2         |                                   |                                               
                                  descriptors      |-----------------------------------|                                               
                                                    B) T2 syscall writes 4 descriptors                                                 
                                                                                                                                       
                                                                                                                                       
                                                                                                                                       
                                                                                                                                       
                 Notes:                                                                                                                
                 1) T1 and T2 AF_XDP sockets are two different sockets,                                                                
                    __xsk_generic_xmit will obtain two different mutexes.                                                              
                 2) T1 and T2 can be executed simultaneously, there is no                                                              
                    critical section whatsoever between them.                                                                          
XSK represents a single queue and each queue is single producer single
consumer. The fact that you can dup a socket and call sendmsg from
different threads/processes does not lift that restriction. I think
if you add synchronization on the userspace (lock(); sendmsg();
unlock();), that should help, right?
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help