Re: Question about the sndbuf of the tap interface with vhost-net
From: Harold Huang <hidden>
Date: 2022-02-24 07:31:29
Hi, Jason, Jason Wang [off-list ref] 于2022年2月24日周四 12:40写道:
On Thu, Feb 24, 2022 at 12:19 PM Harold Huang [off-list ref] wrote:quoted
Thanks for Jason's comments. Jason Wang [off-list ref] 于2022年2月24日周四 11:23写道:quoted
Adding netdev. On Wed, Feb 23, 2022 at 9:46 PM Harold Huang [off-list ref] wrote:quoted
Sorry. The performance tested by iperf is degraded from 4.5 Gbps to 750Mbps per flow. Harold Huang [off-list ref] 于2022年2月23日周三 21:13写道:quoted
I see in dpdk virtio-user driver, the TUNSETSNDBUF is initialized with INT_MAX, see: https://github.com/DPDK/dpdk/blob/main/drivers/net/virtio/virtio_user/vhost_kernel_tap.c#L169Note that Linux use INT_MAX as default sndbuf for tuntap.quoted
quoted
It is ok because tap driver uses it to support tx baching, see this patch: https://github.com/torvalds/linux/commit/0a0be13b8fe2cac11da2063fb03f0f39359b3069 But in tun_xdp_one, napi is not supported and I want to user napi in tun_get_user to enable gro.NAPI is not enabled in this path, want to send a patch to do that?Yes, I have a patch in this path to enable NAPI and it greatly improves TCP stream performance, from 4.5Gbsp to 9.2 Gbps per flow. I will send it later for comments.Good to know that. Have you compared it with non-NAPI mode?
Do you mean using netif_rx? If so, I have tested and the performance is about 5Gbps. The netif_rx calls process_backlog to process packet but it does not support GRO either.
quoted
quoted
Btw, NAPI mode is used for kernel networking stack hardening at start, but it would be interesting to see if it helps for the performance.quoted
quoted
As I result, I change the sndbuf to a value such as 212992 in /proc/sys/net/core/wmem_default.Can you describe your setup in detail? Where did you run the iperf server and client and where did you change the wmem_default?I use dpdk-testpmd to test the vhost-net performance, such as: dpdk-testpmd -l 0-9 -n 4 --vdev=virtio_user0,path=/dev/vhost-net,queue_size=1024,mac=00:00:0a:00:00:02 -a 0000:06:00.1 -- -i --txd=1024 --rxd=1024 And I have changed the sndbuf in https://github.com/DPDK/dpdk/blob/main/drivers/net/virtio/virtio_user/vhost_kernel_tap.c#L169 to 212992, which is not INT_MAX anymore. I also enable NAPI in the tun module. The iperf server ran in the tap interface on the kernel side, which would receive TCP stream from dpdk-testpmd.You're do TCP stream testing among two TAP and using tesmpd to forward traffic?
The test topology is as follow:
________________________
| |
iperf-server-----tap<------->testpmd<------> ixgbe<----------->igxbe
(iperf client)
|_______________________|
The testpmd is used to forward traffic from another machine.
quoted
But the performance is greatly degraded, from 4.5 Gbps to 750Mbps. I am confused about the perf result of the cpu core where iperf server ran, which has a serious bottleneck: 59.86% cpu on the report_bug and 20.66% on the module_find_bug.This looks odd, you may want to check your perf, I don't think module_find_bug() will run at datapath.quoted
I use centos 8.2 with a native 4.18.0-193.el8.x86_64 kernel to test.The kernel is kind of too old, I suggest to test recent kernel version.
I will use a recent kernel to test it later.
Thanksquoted
quoted
quoted
quoted
But the performance tested by iperf is greatly degraded, from 4.5 Gbps to 750Gbps per flow. I see the the iperf server consume 100% cpu core, which should be the bottleneck of the this test. The perf top result of iperf server cpu core is as follows: ''' Samples: 72 of event 'cycles', 4000 Hz, Event count (approx.): 22685278 lost: 0/0 drop: 0/0 Overhead Shared O Symbol 59.86% [kernel] [k] report_bug 20.66% [kernel] [k] module_find_bug 6.51% [kernel] [k] common_interrupt 2.82% [kernel] [k] __slab_free 1.48% [kernel] [k] copy_user_enhanced_fast_string 1.44% [kernel] [k] __skb_datagram_iter 1.42% [kernel] [k] notifier_call_chain 1.41% [kernel] [k] irq_work_run_list 1.41% [kernel] [k] update_irq_load_avg 1.41% [kernel] [k] task_tick_fair 1.41% [kernel] [k] cmp_ex_search 0.16% [kernel] [k] __ghes_peek_estatus.isra.12 0.02% [kernel] [k] acpi_os_read_memory 0.00% [kernel] [k] native_apic_mem_write ''' I am not clear about the test result. Can we change the sndbuf size in dpdk? Is any way to enable vhost_net to use napi without changing the tun kernel driver?You can do this by not using INT_MAX as sndbuf.Just mentioned above, I change the sndbuf value and I met a serious performance degradation.quoted
Thanksquoted