Thread (119 messages) 119 messages, 8 authors, 2018-09-10

Re: [RFC 0/4] Virtio uses DMA API for all devices

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: 2018-08-02 21:51:46
Also in: linuxppc-dev, virtualization

On Thu, Aug 02, 2018 at 04:13:09PM -0500, Benjamin Herrenschmidt wrote:
On Thu, 2018-08-02 at 23:52 +0300, Michael S. Tsirkin wrote:
quoted
quoted
Yes, this is the purpose of Anshuman original patch (I haven't looked
at the details of the patch in a while but that's what I told him to
implement ;-) :

  - Make virtio always use DMA ops to simplify the code path (with a set
of "transparent" ops for legacy)

  and

  -  Provide an arch hook allowing us to "override" those "transparent"
DMA ops with some custom ones that do the appropriate swiotlb gunk.

Cheers,
Ben.
Right but as I tried to say doing that brings us to a bunch of issues
with using DMA APIs in virtio. Put simply DMA APIs weren't designed for
guest to hypervisor communication.
I'm not sure I see the problem, see below
quoted
When we do (as is the case with PLATFORM_IOMMU right now) this adds a
bunch of overhead which we need to get rid of if we are to switch to
PLATFORM_IOMMU by default.  We need to fix that.
So let's differenciate the two problems of having an IOMMU (real or
emulated) which indeeds adds overhead etc... and using the DMA API.
Well actually it's the other way around. An iommu in theory doesn't need
to bring overhead if you set it in bypass mode.  Which does imply the
iommu supports bypass mode. Is that universally the case?  DMA API does
see Christoph's list of things it does some of which add overhead.
At the moment, virtio does this all over the place:

	if (use_dma_api)
		dma_map/alloc_something(...)
	else
		use_pa

The idea of the patch set is to do two, somewhat orthogonal, changes
that together achieve what we want. Let me know where you think there
is "a bunch of issues" because I'm missing it:

 1- Replace the above if/else constructs with just calling the DMA API,
and have virtio, at initialization, hookup its own dma_ops that just
"return pa" (roughly) when the IOMMU stuff isn't used.

This adds an indirect function call to the path that previously didn't
have one (the else case above). Is that a significant/measurable
overhead ?
Seems to be :( Jason reports about 4%. I wonder whether we can support
map_sg and friends being NULL, then use that when mapping is an
identity. A conditional branch there is likely very cheap.

Would this cover all platforms with kvm (which is where we care
most about performance)?
This change stands alone, and imho "cleans" up virtio by avoiding all
that if/else "2 path" and unless it adds a measurable overhead, should
probably be done.

 2- Make virtio use the DMA API with our custom platform-provided
swiotlb callbacks when needed, that is when not using IOMMU *and*
running on a secure VM in our case.

This benefits from -1- by making us just plumb in a different set of
DMA ops we would have cooked up specifically for virtio in our arch
code (or in virtio itself but build arch-conditionally in a separate
file). But it doesn't strictly need it -1-:

Now, -2- doesn't strictly needs -1-. We could have just done another
xen-like hack that forces the DMA API "ON" for virtio when running in a
secure VM.

The problem if we do that however is that we also then need the arch
PCI code to make sure it hooks up the virtio PCI devices with the
special "magic" DMA ops that avoid the iommu but still do swiotlb, ie,
not the same as other PCI devices. So it will have to play games such
as checking vendor/device IDs for virtio, checking the IOMMU flag,
etc... from the arch code which really bloody sucks when assigning PCI
DMA ops.

However, if we do it the way we plan here, on top of -1-, with a hook
called from virtio into the arch to "override" the virtio DMA ops, then
we avoid the problem completely: The arch hook would only be called by
virtio if the IOMMU flag is *not* set. IE only when using that special
"hypervisor" iommu bypass. If the IOMMU flag is set, virtio uses normal
PCI dma ops as usual.

That way, we have a very clear semantic: This hook is purely about
replacing those "null" DMA ops that just return PA introduced in -1-
with some arch provided specially cooked up DMA ops for non-IOMMU
virtio that know about the arch special requirements. For us bounce
buffering.

Is there something I'm missing ?

Cheers,
Ben.
Right so I was trying to write it up in a systematic way, but just to
give you one example, if there is a system where DMA API handles
coherency issues, or flushing of some buffers, then our PLATFORM_IOMMU
flag causes that to happen.

And we kinda worked around this without the IOMMU by basically saying
"ok we do not really need DMA API so let's just bypass it" and
it was kind of ok except now everyone is switching to vIOMMU
just in case. So now people do want some parts of what DMA API does,
such as the bounce buffer use, or IOMMU mappings.

And maybe in the end the solution is going to be to do something similar
to virt_Xmb except for DMA APIs: add APIs that handle just the
addressing bits but without the overhead. See
commit 6a65d26385bf487926a0616650927303058551e3
    asm-generic: implement virt_xxx memory barriers
for reference, it's a similar set of issues.

So it's not a problem with your patches as such, it's just that
they don't solve that harder problem.

-- 
MST
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help