Re: [RFC 0/4] Virtio uses DMA API for all devices
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: 2018-08-02 21:51:46
Also in:
linuxppc-dev, virtualization
On Thu, Aug 02, 2018 at 04:13:09PM -0500, Benjamin Herrenschmidt wrote:
On Thu, 2018-08-02 at 23:52 +0300, Michael S. Tsirkin wrote:quoted
quoted
Yes, this is the purpose of Anshuman original patch (I haven't looked at the details of the patch in a while but that's what I told him to implement ;-) : - Make virtio always use DMA ops to simplify the code path (with a set of "transparent" ops for legacy) and - Provide an arch hook allowing us to "override" those "transparent" DMA ops with some custom ones that do the appropriate swiotlb gunk. Cheers, Ben.Right but as I tried to say doing that brings us to a bunch of issues with using DMA APIs in virtio. Put simply DMA APIs weren't designed for guest to hypervisor communication.I'm not sure I see the problem, see belowquoted
When we do (as is the case with PLATFORM_IOMMU right now) this adds a bunch of overhead which we need to get rid of if we are to switch to PLATFORM_IOMMU by default. We need to fix that.So let's differenciate the two problems of having an IOMMU (real or emulated) which indeeds adds overhead etc... and using the DMA API.
Well actually it's the other way around. An iommu in theory doesn't need to bring overhead if you set it in bypass mode. Which does imply the iommu supports bypass mode. Is that universally the case? DMA API does see Christoph's list of things it does some of which add overhead.
At the moment, virtio does this all over the place: if (use_dma_api) dma_map/alloc_something(...) else use_pa The idea of the patch set is to do two, somewhat orthogonal, changes that together achieve what we want. Let me know where you think there is "a bunch of issues" because I'm missing it: 1- Replace the above if/else constructs with just calling the DMA API, and have virtio, at initialization, hookup its own dma_ops that just "return pa" (roughly) when the IOMMU stuff isn't used. This adds an indirect function call to the path that previously didn't have one (the else case above). Is that a significant/measurable overhead ?
Seems to be :( Jason reports about 4%. I wonder whether we can support map_sg and friends being NULL, then use that when mapping is an identity. A conditional branch there is likely very cheap. Would this cover all platforms with kvm (which is where we care most about performance)?
This change stands alone, and imho "cleans" up virtio by avoiding all that if/else "2 path" and unless it adds a measurable overhead, should probably be done. 2- Make virtio use the DMA API with our custom platform-provided swiotlb callbacks when needed, that is when not using IOMMU *and* running on a secure VM in our case. This benefits from -1- by making us just plumb in a different set of DMA ops we would have cooked up specifically for virtio in our arch code (or in virtio itself but build arch-conditionally in a separate file). But it doesn't strictly need it -1-: Now, -2- doesn't strictly needs -1-. We could have just done another xen-like hack that forces the DMA API "ON" for virtio when running in a secure VM. The problem if we do that however is that we also then need the arch PCI code to make sure it hooks up the virtio PCI devices with the special "magic" DMA ops that avoid the iommu but still do swiotlb, ie, not the same as other PCI devices. So it will have to play games such as checking vendor/device IDs for virtio, checking the IOMMU flag, etc... from the arch code which really bloody sucks when assigning PCI DMA ops. However, if we do it the way we plan here, on top of -1-, with a hook called from virtio into the arch to "override" the virtio DMA ops, then we avoid the problem completely: The arch hook would only be called by virtio if the IOMMU flag is *not* set. IE only when using that special "hypervisor" iommu bypass. If the IOMMU flag is set, virtio uses normal PCI dma ops as usual. That way, we have a very clear semantic: This hook is purely about replacing those "null" DMA ops that just return PA introduced in -1- with some arch provided specially cooked up DMA ops for non-IOMMU virtio that know about the arch special requirements. For us bounce buffering. Is there something I'm missing ? Cheers, Ben.
Right so I was trying to write it up in a systematic way, but just to
give you one example, if there is a system where DMA API handles
coherency issues, or flushing of some buffers, then our PLATFORM_IOMMU
flag causes that to happen.
And we kinda worked around this without the IOMMU by basically saying
"ok we do not really need DMA API so let's just bypass it" and
it was kind of ok except now everyone is switching to vIOMMU
just in case. So now people do want some parts of what DMA API does,
such as the bounce buffer use, or IOMMU mappings.
And maybe in the end the solution is going to be to do something similar
to virt_Xmb except for DMA APIs: add APIs that handle just the
addressing bits but without the overhead. See
commit 6a65d26385bf487926a0616650927303058551e3
asm-generic: implement virt_xxx memory barriers
for reference, it's a similar set of issues.
So it's not a problem with your patches as such, it's just that
they don't solve that harder problem.
--
MST