On Wed, Aug 08, 2018 at 11:18:13PM +1000, Benjamin Herrenschmidt wrote:
Sure, but all of this is just the configuration of the iommu. But I
think we agree here, and your point remains valid, indeed my proposed
hack:
quoted
if ((flags & VIRTIO_F_IOMMU_PLATFORM) || arch_virtio_wants_dma_ops())
Will only work if the IOMMU and non-IOMMU path are completely equivalent.
We can provide that guarantee for our secure VM case, but not generally so if
we were to go down the route of a quirk in virtio, it might be better to
make it painfully obvious that it's specific to that one case with a different
kind of turd:
- if (xen_domain())
+ if (xen_domain() || pseries_secure_vm())
return true;
I don't think it's pseries specific actually. E.g. I suspect AMD SEV
might benefit from the same kind of hack.
So to summarize, and make sure I'm not missing something, the two approaches
at hand are either:
1- The above, which is a one liner and contained in the guest, so that's nice, but
also means another turd in virtio which isn't ...
2- We force pseries to always set VIRTIO_F_IOMMU_PLATFORM, but with the current
architecture on our side that will force virtio to always go through an emulated
iommu, as pseries doesn't have the concept of a real bypass window, and thus will
impact performance for both secure and non-secure VMs.
3- Invent a property that can be put in selected PCI device tree nodes that
indicates that for that device specifically, the iommu can be bypassed, along with
a hypercall to turn that bypass on/off. Virtio would then use VIRTIO_F_IOMMU_PLATFORM
but its DT nodes would also have that property and Linux would notice it and turn
bypass on.
For completeness, virtio could also have its own bounce buffer
outside of DMA API one. I don't see lots of benefits to this
though.
The resulting properties of those options are:
1- Is what I want because it's the simplest, provides the best performance now,
and works without code changes to qemu or non-secure Linux. However it does
add a tiny turd to virtio which is annoying.
2- This works but it puts the iommu in the way always, thus reducing virtio performance
accross the board for pseries unless we only do that for secure VMs but that is
difficult (as discussed earlier).
3- This would recover the performance lost in -2-, however it requires qemu *and*
guest changes. Specifically, existing guests (RHEL 7 etc...) would get the
performance hit of -2- unless modified to call that 'enable bypass' call, which
isn't great.
So imho we have to chose one of 3 not-great solutions here... Unless I missed
something in your ideas of course.
Cheers,
Ben.