Thread (128 messages) 128 messages, 13 authors, 2024-09-25

Re: [RFC PATCH 12/21] KVM: IOMMUFD: MEMFD: Map private pages

From: Jason Gunthorpe <jgg@nvidia.com>
Date: 2024-08-29 12:15:54
Also in: kvm, linux-iommu, linux-pci

On Thu, Aug 29, 2024 at 05:34:52PM +0800, Xu Yilun wrote:
On Mon, Aug 26, 2024 at 09:30:24AM -0300, Jason Gunthorpe wrote:
quoted
On Mon, Aug 26, 2024 at 08:39:25AM +0000, Tian, Kevin wrote:
quoted
quoted
IOMMUFD calls get_user_pages() for every mapping which will allocate
shared memory instead of using private memory managed by the KVM and
MEMFD.

Add support for IOMMUFD fd to the VFIO KVM device's KVM_DEV_VFIO_FILE
API
similar to already existing VFIO device and VFIO group fds.
This addition registers the KVM in IOMMUFD with a callback to get a pfn
for guest private memory for mapping it later in the IOMMU.
No callback for free as it is generic folio_put() for now.

The aforementioned callback uses uptr to calculate the offset into
the KVM memory slot and find private backing pfn, copies
kvm_gmem_get_pfn() pretty much.

This relies on private pages to be pinned beforehand.
There was a related discussion [1] which leans toward the conclusion
that the IOMMU page table for private memory will be managed by
the secure world i.e. the KVM path.
It is still effectively true, AMD's design has duplication, the RMP
table has the mappings to validate GPA and that is all managed in the
secure world.

They just want another copy of that information in the unsecure world
in the form of page tables :\
quoted
btw going down this path it's clearer to extend the MAP_DMA
uAPI to accept {gmemfd, offset} than adding a callback to KVM.
Yes, we want a DMA MAP from memfd sort of API in general. So it should
go directly to guest memfd with no kvm entanglement.
A uAPI like ioctl(MAP_DMA, gmemfd, offset, iova) still means userspace
takes control of the IOMMU mapping in the unsecure world. 
Yes, such is how it seems to work.

It doesn't actually have much control, it has to build a mapping that
matches the RMP table exactly but still has to build it..
But as mentioned, the unsecure world mapping is just a "copy" and
has no generic meaning without the CoCo-VM context. Seems no need
for userspace to repeat the "copy" for IOMMU.
Well, here I say copy from the information already in the PSP secure
world in the form fo their RMP, but in a different format.

There is another copy in KVM in it's stage 2 translation but..
Maybe userspace could just find a way to link the KVM context to IOMMU
at the first place, then let KVM & IOMMU directly negotiate the mapping
at runtime.
I think the KVM folks have said no to sharing the KVM stage 2 directly
with the iommu. They do too many operations that are incompatible with
the iommu requirements for the stage 2.

If that is true for the confidential compute, I don't know.

Still, continuing to duplicate the two mappings as we have always done
seems like a reasonable place to start and we want a memfd map anyhow
for other reasons:

https://lore.kernel.org/linux-iommu/20240806125602.GJ478300@nvidia.com/ (local)

Jason
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help