Re: [PATCH net-next v4 00/14] netkit: Support for io_uring zero-copy and AF_XDP
From: David Wei <hidden>
Date: 2025-11-05 00:43:55
Also in:
bpf
On 2025-11-04 15:22, Stanislav Fomichev wrote:
On 10/31, Daniel Borkmann wrote:quoted
Containers use virtual netdevs to route traffic from a physical netdev in the host namespace. They do not have access to the physical netdev in the host and thus can't use memory providers or AF_XDP that require reconfiguring/restarting queues in the physical netdev. This patchset adds the concept of queue peering to virtual netdevs that allow containers to use memory providers and AF_XDP at native speed. These mapped queues are bound to a real queue in a physical netdev and act as a proxy. Memory providers and AF_XDP operations takes an ifindex and queue id, so containers would pass in an ifindex for a virtual netdev and a queue id of a mapped queue, which then gets proxied to the underlying real queue. Peered queues are created and bound to a real queue atomically through a generic ynl netdev operation. We have implemented support for this concept in netkit and tested the latter against Nvidia ConnectX-6 (mlx5) as well as Broadcom BCM957504 (bnxt_en) 100G NICs. For more details see the individual patches. v3->v4: - ndo_queue_create store dst queue via arg (Nikolay) - Small nits like a spelling issue + rev xmas (Nikolay) - admin-perm flag in bind-queue spec (Jakub) - Fix potential ABBA deadlock situation in bind (Jakub, Paolo, Stan) - Add a peer dev_tracker to not reuse the sysfs one (Jakub) - New patch (12/14) to handle the underlying device going away (Jakub) - Improve commit message on queue-get (Jakub) - Do not expose phys dev info from container on queue-get (Jakub) - Add netif_put_rx_queue_peer_locked to simplify code (Stan) - Rework xsk handling to simplify the code and drop a few patches - Rebase and retested everything with mlx5 + bnxt_enI mostly looked at patches 1-8 and they look good to me. Will it be possible to put your sample runs from 13 and 14 into a selftest form? Even if you require real hw, that should be doable, similar to tools/testing/selftests/drivers/net/hw/devmem.py, right?
Thanks for taking a look. For io_uring at least, it requires both a routable VIP that can be assigned to the netkit in a netns and a BPF program for skb forwarding. I could add a selftest, but it'll be hard to generalise across all envs. I'm hoping to get self contained QEMU VM selftest support first. WDYT?