Re: [RFC PATCH 12/21] KVM: IOMMUFD: MEMFD: Map private pages
From: Jason Gunthorpe <jgg@nvidia.com>
Date: 2024-08-29 12:15:54
Also in:
kvm, linux-iommu, linux-pci
On Thu, Aug 29, 2024 at 05:34:52PM +0800, Xu Yilun wrote:
On Mon, Aug 26, 2024 at 09:30:24AM -0300, Jason Gunthorpe wrote:quoted
On Mon, Aug 26, 2024 at 08:39:25AM +0000, Tian, Kevin wrote:quoted
quoted
IOMMUFD calls get_user_pages() for every mapping which will allocate shared memory instead of using private memory managed by the KVM and MEMFD. Add support for IOMMUFD fd to the VFIO KVM device's KVM_DEV_VFIO_FILE API similar to already existing VFIO device and VFIO group fds. This addition registers the KVM in IOMMUFD with a callback to get a pfn for guest private memory for mapping it later in the IOMMU. No callback for free as it is generic folio_put() for now. The aforementioned callback uses uptr to calculate the offset into the KVM memory slot and find private backing pfn, copies kvm_gmem_get_pfn() pretty much. This relies on private pages to be pinned beforehand.There was a related discussion [1] which leans toward the conclusion that the IOMMU page table for private memory will be managed by the secure world i.e. the KVM path.It is still effectively true, AMD's design has duplication, the RMP table has the mappings to validate GPA and that is all managed in the secure world. They just want another copy of that information in the unsecure world in the form of page tables :\quoted
btw going down this path it's clearer to extend the MAP_DMA uAPI to accept {gmemfd, offset} than adding a callback to KVM.Yes, we want a DMA MAP from memfd sort of API in general. So it should go directly to guest memfd with no kvm entanglement.A uAPI like ioctl(MAP_DMA, gmemfd, offset, iova) still means userspace takes control of the IOMMU mapping in the unsecure world.
Yes, such is how it seems to work. It doesn't actually have much control, it has to build a mapping that matches the RMP table exactly but still has to build it..
But as mentioned, the unsecure world mapping is just a "copy" and has no generic meaning without the CoCo-VM context. Seems no need for userspace to repeat the "copy" for IOMMU.
Well, here I say copy from the information already in the PSP secure world in the form fo their RMP, but in a different format. There is another copy in KVM in it's stage 2 translation but..
Maybe userspace could just find a way to link the KVM context to IOMMU at the first place, then let KVM & IOMMU directly negotiate the mapping at runtime.
I think the KVM folks have said no to sharing the KVM stage 2 directly with the iommu. They do too many operations that are incompatible with the iommu requirements for the stage 2. If that is true for the confidential compute, I don't know. Still, continuing to duplicate the two mappings as we have always done seems like a reasonable place to start and we want a memfd map anyhow for other reasons: https://lore.kernel.org/linux-iommu/20240806125602.GJ478300@nvidia.com/ (local) Jason