Thread (42 messages) 42 messages, 5 authors, 2017-11-28

Re: Regression in throughput between kvm guests over virtual bridge

From: Matthew Rosato <hidden>
Date: 2017-10-18 20:17:57

On 10/12/2017 02:31 PM, Wei Xu wrote:
On Thu, Oct 05, 2017 at 04:07:45PM -0400, Matthew Rosato wrote:
quoted
Ping...  Jason, any other ideas or suggestions?
Hi Matthew,
Recently I am doing similar test on x86 for this patch, here are some,
differences between our testbeds.

1. It is nice you have got improvement with 50+ instances(or connections here?)
which would be quite helpful to address the issue, also you've figured out the
cost(wait/wakeup), kindly reminder did you pin uperf client/server along the whole
path besides vhost and vcpu threads? 
Was not previously doing any pinning whatsoever, just reproducing an
environment that one of our testers here was running.  Reducing guest
vcpu count from 4->1, still see the regression.  Then, pinned each vcpu
thread and vhost thread to a separate host CPU -- still made no
difference (regression still present).
2. It might be useful to short the traffic path as a reference, What I am running
is briefly like:
    pktgen(host kernel) -> tap(x) -> guest(DPDK testpmd)

The bridge driver(br_forward(), etc) might impact performance due to my personal
experience, so eventually I settled down with this simplified testbed which fully
isolates the traffic from both userspace and host kernel stack(1 and 50 instances,
bridge driver, etc), therefore reduces potential interferences.

The down side of this is that it needs DPDK support in guest, has this ever be
run on s390x guest? An alternative approach is to directly run XDP drop on
virtio-net nic in guest, while this requires compiling XDP inside guest which needs
a newer distro(Fedora 25+ in my case or Ubuntu 16.10, not sure).
I made an attempt at DPDK, but it has not been run on s390x as far as
I'm aware and didn't seem trivial to get working.

So instead I took your alternate suggestion & did:
pktgen(host) -> tap(x) -> guest(xdp_drop)

When running this setup, I am not able to reproduce the regression.  As
mentioned previously, I am also unable to reproduce when running one end
of the uperf connection from the host - I have only ever been able to
reproduce when both ends of the uperf connection are running within a guest.
3. BTW, did you enable hugepage for your guest? It would  performance more
or less depends on the memory demand when generating traffic, I didn't see
similar command lines in yours.
s390x does not currently support passing through hugetlb backing via
QEMU mem-path.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help