Thread (39 messages) 39 messages, 5 authors, 2019-01-02

Re: [PATCH net-next 0/3] vhost: accelerate metadata access through vmap()

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: 2018-12-14 12:52:34
Also in: kvm, lkml

On Fri, Dec 14, 2018 at 12:29:54PM +0800, Jason Wang wrote:
On 2018/12/14 上午4:12, Michael S. Tsirkin wrote:
quoted
On Thu, Dec 13, 2018 at 06:10:19PM +0800, Jason Wang wrote:
quoted
Hi:

This series tries to access virtqueue metadata through kernel virtual
address instead of copy_user() friends since they had too much
overheads like checks, spec barriers or even hardware feature
toggling.

Test shows about 24% improvement on TX PPS. It should benefit other
cases as well.

Please review
I think the idea of speeding up userspace access is a good one.
However I think that moving all checks to start is way too aggressive.

So did packet and AF_XDP. Anyway, sharing address space and access them
directly is the fastest way. Performance is the major consideration for
people to choose backend. Compare to userspace implementation, vhost does
not have security advantages at any level. If vhost is still slow, people
will start to develop backends based on e.g AF_XDP.
Let them what's wrong with that?
quoted
Instead, let's batch things up but let's not keep them
around forever.
Here are some ideas:


1. Disable preemption, process a small number of small packets
    directly in an atomic context. This should cut latency
    down significantly, the tricky part is to only do it
    on a light load and disable this
    for the streaming case otherwise it's unfair.
    This might fail, if it does just bounce things out to
    a thread.

I'm not sure what context you meant here. Is this for TX path of TUN? But a
fundamental difference is my series is targeted for extreme heavy load not
light one, 100% cpu for vhost is expected.
Interesting. You only shared a TCP RR result though.
What's the performance gain in a heavy load case?
quoted
2. Switch to unsafe_put_user/unsafe_get_user,
    and batch up multiple accesses.

As I said, unless we can batch accessing of two difference places of three
of avail, descriptor and used. It won't help for batching the accessing of a
single place like used. I'm even not sure this can be done consider the case
of packed virtqueue, we have a single descriptor ring.
So that's one of the reasons packed should be faster. Single access
and you get the descriptor no messy redirects. Somehow your
benchmarking so far didn't show a gain with vhost and
packed though - do you know what's wrong?
Batching through
unsafe helpers may not help in this case since it's equivalent to safe ones
. And This requires non trivial refactoring of vhost. And such refactoring
itself make give us noticeable impact (e.g it may lead regression).

quoted
3. Allow adding a fixup point manually,
    such that multiple independent get_user accesses
    can get a single fixup (will allow better compiler
    optimizations).
So for metadata access, I don't see how you suggest here can help in the
case of heavy workload.

For data access, this may help but I've played to batch the data copy to
reduce SMAP/spec barriers in vhost-net but I don't see performance
improvement.

Thanks
So how about we try to figure what's going on actually?
Can you drop the barriers and show the same gain?
E.g. vmap does not use a huge page IIRC so in fact it
can be slower than direct access. It's not a magic
faster way.


quoted

quoted
Jason Wang (3):
   vhost: generalize adding used elem
   vhost: fine grain userspace memory accessors
   vhost: access vq metadata through kernel virtual address

  drivers/vhost/vhost.c | 281 ++++++++++++++++++++++++++++++++++++++----
  drivers/vhost/vhost.h |  11 ++
  2 files changed, 266 insertions(+), 26 deletions(-)

-- 
2.17.1
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help