Thread (42 messages) 42 messages, 5 authors, 2017-11-28

Re: Regression in throughput between kvm guests over virtual bridge

From: Matthew Rosato <hidden>
Date: 2017-09-15 03:36:26

Is the issue gone if you reduce VHOST_RX_BATCH to 1? And it would be
also helpful to collect perf diff to see if anything interesting.
(Consider 4.4 shows more obvious regression, please use 4.4).
Issue still exists when I force VHOST_RX_BATCH = 1

Collected perf data, with 4.12 as the baseline, 4.13 as delta1 and
4.13+VHOST_RX_BATCH=1 as delta2. All guests running 4.4.  Same scenario,
2 uperf client guests, 2 uperf slave guests - I collected perf data
against 1 uperf client process and 1 uperf slave process.  Here are the
significant diffs:

uperf client:

75.09%   +9.32%   +8.52%  [kernel.kallsyms]   [k] enabled_wait
 9.04%   -4.11%   -3.79%  [kernel.kallsyms]   [k] __copy_from_user
 2.30%   -0.79%   -0.71%  [kernel.kallsyms]   [k] arch_free_page
 2.17%   -0.65%   -0.58%  [kernel.kallsyms]   [k] arch_alloc_page
 0.69%   -0.25%   -0.24%  [kernel.kallsyms]   [k] get_page_from_freelist
 0.56%   +0.08%   +0.14%  [kernel.kallsyms]   [k] virtio_ccw_kvm_notify
 0.42%   -0.11%   -0.09%  [kernel.kallsyms]   [k] tcp_sendmsg
 0.31%   -0.15%   -0.14%  [kernel.kallsyms]   [k] tcp_write_xmit

uperf slave:

72.44%   +8.99%   +8.85%  [kernel.kallsyms]   [k] enabled_wait
 8.99%   -3.67%   -3.51%  [kernel.kallsyms]   [k] __copy_to_user
 2.31%   -0.71%   -0.67%  [kernel.kallsyms]   [k] arch_free_page
 2.16%   -0.67%   -0.63%  [kernel.kallsyms]   [k] arch_alloc_page
 0.89%   -0.14%   -0.11%  [kernel.kallsyms]   [k] virtio_ccw_kvm_notify
 0.71%   -0.30%   -0.30%  [kernel.kallsyms]   [k] get_page_from_freelist
 0.70%   -0.25%   -0.29%  [kernel.kallsyms]   [k] __wake_up_sync_key
 0.61%   -0.22%   -0.22%  [kernel.kallsyms]   [k] virtqueue_add_inbuf

May worth to try disable zerocopy or do the test form host to guest
instead of guest to guest to exclude the possible issue of sender.
With zerocopy disabled, still seeing the regression.  The provided perf
#s have zerocopy enabled.

I replaced 1 uperf guest and instead ran that uperf client as a host
process, pointing at a guest.  All traffic still over the virtual
bridge.  In this setup, it's still easy to see the regression for the
remaining guest1<->guest2 uperf run, but the host<->guest3 run does NOT
exhibit a reliable regression pattern.  The significant perf diffs from
the host uperf process (baseline=4.12, delta=4.13):


59.96%   +5.03%  [kernel.kallsyms]           [k] enabled_wait
 6.47%   -2.27%  [kernel.kallsyms]           [k] raw_copy_to_user
 5.52%   -1.63%  [kernel.kallsyms]           [k] raw_copy_from_user
 0.87%   -0.30%  [kernel.kallsyms]           [k] get_page_from_freelist
 0.69%   +0.30%  [kernel.kallsyms]           [k] finish_task_switch
 0.66%   -0.15%  [kernel.kallsyms]           [k] swake_up
 0.58%   -0.00%  [vhost]                     [k] vhost_get_vq_desc
   ...
 0.42%   +0.50%  [kernel.kallsyms]           [k] ckc_irq_pending

I also tried flipping the uperf stream around (a guest uperf client is
communicating to a slave uperf process on the host) and also cannot see
the regression pattern.  So it seems to require a guest on both ends of
the connection.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help