Thread (33 messages) 33 messages, 3 authors, 2024-02-23

Re: [PATCH net-next 0/5] virtio-net: sq support premapped mode

From: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Date: 2024-01-25 05:52:16
Also in: bpf, virtualization

On Thu, 25 Jan 2024 13:42:05 +0800, Xuan Zhuo [off-list ref] wrote:
On Thu, 25 Jan 2024 11:39:28 +0800, Jason Wang [off-list ref] wrote:
quoted
On Tue, Jan 16, 2024 at 3:59 PM Xuan Zhuo [off-list ref] wrote:
quoted
This is the second part of virtio-net support AF_XDP zero copy.

The whole patch set
http://lore.kernel.org/all/20231229073108.57778-1-xuanzhuo@linux.alibaba.com (local)

## About the branch

This patch set is pushed to the net-next branch, but some patches are about
virtio core. Because the entire patch set for virtio-net to support AF_XDP
should be pushed to net-next, I hope these patches will be merged into net-next
with the virtio core maintains's Acked-by.

============================================================================

## AF_XDP

XDP socket(AF_XDP) is an excellent bypass kernel network framework. The zero
copy feature of xsk (XDP socket) needs to be supported by the driver. The
performance of zero copy is very good. mlx5 and intel ixgbe already support
this feature, This patch set allows virtio-net to support xsk's zerocopy xmit
feature.

At present, we have completed some preparation:

1. vq-reset (virtio spec and kernel code)
2. virtio-core premapped dma
3. virtio-net xdp refactor

So it is time for Virtio-Net to complete the support for the XDP Socket
Zerocopy.

Virtio-net can not increase the queue num at will, so xsk shares the queue with
kernel.

On the other hand, Virtio-Net does not support generate interrupt from driver
manually, so when we wakeup tx xmit, we used some tips. If the CPU run by TX
NAPI last time is other CPUs, use IPI to wake up NAPI on the remote CPU. If it
is also the local CPU, then we wake up napi directly.

This patch set includes some refactor to the virtio-net to let that to support
AF_XDP.

## performance

ENV: Qemu with vhost-user(polling mode).
Host CPU: Intel(R) Xeon(R) Platinum 8163 CPU @ 2.50GHz

### virtio PMD in guest with testpmd

testpmd> show port stats all

 ######################## NIC statistics for port 0 ########################
 RX-packets: 19531092064 RX-missed: 0     RX-bytes: 1093741155584
 RX-errors: 0
 RX-nombuf: 0
 TX-packets: 5959955552 TX-errors: 0     TX-bytes: 371030645664


 Throughput (since last show)
 Rx-pps:   8861574     Rx-bps:  3969985208
 Tx-pps:   8861493     Tx-bps:  3969962736
 ############################################################################

### AF_XDP PMD in guest with testpmd

testpmd> show port stats all

  ######################## NIC statistics for port 0  ########################
  RX-packets: 68152727   RX-missed: 0          RX-bytes:  3816552712
  RX-errors: 0
  RX-nombuf:  0
  TX-packets: 68114967   TX-errors: 33216      TX-bytes:  3814438152

  Throughput (since last show)
  Rx-pps:      6333196          Rx-bps:   2837272088
  Tx-pps:      6333227          Tx-bps:   2837285936
  ############################################################################

But AF_XDP consumes more CPU for tx and rx napi(100% and 86%).

## maintain

I am currently a reviewer for virtio-net. I commit to maintain AF_XDP support in
virtio-net.

Please review.
Rethink of the whole design, I have one question:

The reason we need to store DMA information is to harden the virtqueue
to make sure the DMA unmap is safe. This seems redundant when the
buffer were premapped by the driver, for example:

Receive queue maintains DMA information, so it doesn't need desc_extra to work.

So can we simply

1) when premapping is enabled, store DMA information by driver itself
YES. this is simpler. And this is more convenience.
But the driver must allocate memory to store the dma info.
quoted
2) don't store DMA information in desc_extra
YES. But the desc_extra memory is wasted. The "next" item is used.
Do you think should we free the desc_extra when the vq is premapped mode?

struct vring_desc_extra {
	dma_addr_t addr;		/* Descriptor DMA addr. */
	u32 len;			/* Descriptor length. */
	u16 flags;			/* Descriptor flags. */
	u16 next;			/* The next desc state in a list. */
};


The flags and the next are used whatever premapped or not.

So I think we can add a new array to store the addr and len.
If the vq is premappd, the memory can be freed.

struct vring_desc_extra {
	u16 flags;			/* Descriptor flags. */
	u16 next;			/* The next desc state in a list. */
};

struct vring_desc_dma {
	dma_addr_t addr;		/* Descriptor DMA addr. */
	u32 len;			/* Descriptor length. */
};

Thanks.
Thanks.

quoted
Would this be simpler?

Thanks
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help