Re: Regression in throughput between kvm guests over virtual bridge
From: Matthew Rosato <hidden>
Date: 2017-09-15 03:36:26
Is the issue gone if you reduce VHOST_RX_BATCH to 1? And it would be also helpful to collect perf diff to see if anything interesting. (Consider 4.4 shows more obvious regression, please use 4.4).
Issue still exists when I force VHOST_RX_BATCH = 1 Collected perf data, with 4.12 as the baseline, 4.13 as delta1 and 4.13+VHOST_RX_BATCH=1 as delta2. All guests running 4.4. Same scenario, 2 uperf client guests, 2 uperf slave guests - I collected perf data against 1 uperf client process and 1 uperf slave process. Here are the significant diffs: uperf client: 75.09% +9.32% +8.52% [kernel.kallsyms] [k] enabled_wait 9.04% -4.11% -3.79% [kernel.kallsyms] [k] __copy_from_user 2.30% -0.79% -0.71% [kernel.kallsyms] [k] arch_free_page 2.17% -0.65% -0.58% [kernel.kallsyms] [k] arch_alloc_page 0.69% -0.25% -0.24% [kernel.kallsyms] [k] get_page_from_freelist 0.56% +0.08% +0.14% [kernel.kallsyms] [k] virtio_ccw_kvm_notify 0.42% -0.11% -0.09% [kernel.kallsyms] [k] tcp_sendmsg 0.31% -0.15% -0.14% [kernel.kallsyms] [k] tcp_write_xmit uperf slave: 72.44% +8.99% +8.85% [kernel.kallsyms] [k] enabled_wait 8.99% -3.67% -3.51% [kernel.kallsyms] [k] __copy_to_user 2.31% -0.71% -0.67% [kernel.kallsyms] [k] arch_free_page 2.16% -0.67% -0.63% [kernel.kallsyms] [k] arch_alloc_page 0.89% -0.14% -0.11% [kernel.kallsyms] [k] virtio_ccw_kvm_notify 0.71% -0.30% -0.30% [kernel.kallsyms] [k] get_page_from_freelist 0.70% -0.25% -0.29% [kernel.kallsyms] [k] __wake_up_sync_key 0.61% -0.22% -0.22% [kernel.kallsyms] [k] virtqueue_add_inbuf
May worth to try disable zerocopy or do the test form host to guest instead of guest to guest to exclude the possible issue of sender.
With zerocopy disabled, still seeing the regression. The provided perf #s have zerocopy enabled. I replaced 1 uperf guest and instead ran that uperf client as a host process, pointing at a guest. All traffic still over the virtual bridge. In this setup, it's still easy to see the regression for the remaining guest1<->guest2 uperf run, but the host<->guest3 run does NOT exhibit a reliable regression pattern. The significant perf diffs from the host uperf process (baseline=4.12, delta=4.13): 59.96% +5.03% [kernel.kallsyms] [k] enabled_wait 6.47% -2.27% [kernel.kallsyms] [k] raw_copy_to_user 5.52% -1.63% [kernel.kallsyms] [k] raw_copy_from_user 0.87% -0.30% [kernel.kallsyms] [k] get_page_from_freelist 0.69% +0.30% [kernel.kallsyms] [k] finish_task_switch 0.66% -0.15% [kernel.kallsyms] [k] swake_up 0.58% -0.00% [vhost] [k] vhost_get_vq_desc ... 0.42% +0.50% [kernel.kallsyms] [k] ckc_irq_pending I also tried flipping the uperf stream around (a guest uperf client is communicating to a slave uperf process on the host) and also cannot see the regression pattern. So it seems to require a guest on both ends of the connection.