Thread (233 messages) 233 messages, 15 authors, 2021-10-28

Re: [RFC] /dev/ioasid uAPI proposal

From: Alex Williamson <hidden>
Date: 2021-06-03 20:02:11
Also in: linux-iommu, lkml

On Thu, 3 Jun 2021 09:34:01 -0300
Jason Gunthorpe [off-list ref] wrote:
On Wed, Jun 02, 2021 at 08:50:54PM -0600, Alex Williamson wrote:
quoted
On Wed, 2 Jun 2021 19:45:36 -0300
Jason Gunthorpe [off-list ref] wrote:
  
quoted
On Wed, Jun 02, 2021 at 02:37:34PM -0600, Alex Williamson wrote:
  
quoted
Right.  I don't follow where you're jumping to relaying DMA_PTE_SNP
from the guest page table... what page table?      
I see my confusion now, the phrasing in your earlier remark led me
think this was about allowing the no-snoop performance enhancement in
some restricted way.

It is really about blocking no-snoop 100% of the time and then
disabling the dangerous wbinvd when the block is successful.

Didn't closely read the kvm code :\

If it was about allowing the optimization then I'd expect the guest to
enable no-snoopable regions via it's vIOMMU and realize them to the
hypervisor and plumb the whole thing through. Hence my remark about
the guest page tables..

So really the test is just 'were we able to block it' ?  
Yup.  Do we really still consider that there's some performance benefit
to be had by enabling a device to use no-snoop?  This seems largely a
legacy thing.  
I've recently had some no-snoopy discussions lately.. The issue didn't
vanish, it is still expensive going through all that cache hardware.
quoted
quoted
But Ok, back the /dev/ioasid. This answers a few lingering questions I
had..

1) Mixing IOMMU_CAP_CACHE_COHERENCY and !IOMMU_CAP_CACHE_COHERENCY
   domains.

   This doesn't actually matter. If you mix them together then kvm
   will turn on wbinvd anyhow, so we don't need to use the DMA_PTE_SNP
   anywhere in this VM.

   This if two IOMMU's are joined together into a single /dev/ioasid
   then we can just make them both pretend to be
   !IOMMU_CAP_CACHE_COHERENCY and both not set IOMMU_CACHE.  
Yes and no.  Yes, if any domain is !IOMMU_CAP_CACHE_COHERENCY then we
need to emulate wbinvd, but no we'll use IOMMU_CACHE any time it's
available based on the per domain support available.  That gives us the
most consistent behavior, ie. we don't have VMs emulating wbinvd
because they used to have a device attached where the domain required
it and we can't atomically remap with new flags to perform the same as
a VM that never had that device attached in the first place.  
I think we are saying the same thing..
Hrm?  I think I'm saying the opposite of your "both not set
IOMMU_CACHE".  IOMMU_CACHE is the mapping flag that enables
DMA_PTE_SNP.  Maybe you're using IOMMU_CACHE as the state reported to
KVM?
quoted
quoted
2) How to fit this part of kvm in some new /dev/ioasid world

   What we want to do here is iterate over every ioasid associated
   with the group fd that is passed into kvm.  
Yeah, we need some better names, binding a device to an ioasid (fd) but
then attaching a device to an allocated ioasid (non-fd)... I assume
you're talking about the latter ioasid.  
Fingers crossed on RFCv2.. Here I mean the IOASID object inside the
/dev/iommu FD. The vfio_device would have some kref handle to the
in-kernel representation of it. So we can interact with it..
quoted
quoted
   Or perhaps more directly: an op attaching the vfio_device to the
   kvm and having some simple helper 
         '(un)register ioasid with kvm (kvm, ioasid)'
   that the vfio_device driver can call that just sorts this out.  
We could almost eliminate the device notion altogether here, use an
ioasidfd_for_each_ioasid() but we really want a way to trigger on each
change to the composition of the device set for the ioasid, which is
why we currently do it on addition or removal of a group, where the
group has a consistent set of IOMMU properties.  
That is another quite good option, just forget about trying to be
highly specific and feed in the /dev/ioasid FD and have kvm ask "does
anything in here not enforce snoop?"

With something appropriate to track/block changing that answer.

It doesn't solve the problem to connect kvm to AP and kvmgt though
It does not, we'll probably need a vfio ioctl to gratuitously announce
the KVM fd to each device.  I think some devices might currently fail
their open callback if that linkage isn't already available though, so
it's not clear when that should happen, ie. it can't currently be a
VFIO_DEVICE ioctl as getting the device fd requires an open, but this
proposal requires some availability of the vfio device fd without any
setup, so presumably that won't yet call the driver open callback.
Maybe that's part of the attach phase now... I'm not sure, it's not
clear when the vfio device uAPI starts being available in the process
of setting up the ioasid.  Thanks,

Alex
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help