Re: [RFC PATCH 05/10] vfio: Create a vfio_device from vma lookup
From: Jason Gunthorpe <jgg@nvidia.com>
Date: 2021-03-05 00:36:49
Also in:
lkml
On Thu, Mar 04, 2021 at 05:07:31PM -0700, Alex Williamson wrote:
On Thu, 4 Mar 2021 19:16:33 -0400 Jason Gunthorpe [off-list ref] wrote:quoted
On Thu, Mar 04, 2021 at 02:37:57PM -0700, Alex Williamson wrote:quoted
Therefore unless a bus driver opts-out by replacing vm_private_data, we can identify participating vmas by the vm_ops and have flags indicating if the vma maps device memory such that vfio_get_device_from_vma() should produce a device reference. The vfio IOMMU backends would also consume this, ie. if they get a valid vfio_device from the vma, use the pfn_base field directly. vfio_vm_ops would wrap the bus driver callbacks and provide reference counting on open/close to release this object.quoted
I'm not thrilled with a vfio_device_ops callback plumbed through vfio-core to do vma-to-pfn translation, so I thought this might be a better alternative. Thanks,Maybe you could explain why, because I'm looking at this idea and thinking it looks very complicated compared to a simple driver op callback?vfio-core needs to export a vfio_vma_to_pfn() which I think assumes the caller has already used vfio_device_get_from_vma(), but should still validate the vma is one from a vfio device before calling this new vfio_device_ops callback.
Huh? Validate? Why?
Something like this in the IOMMU stuff:
struct vfio_device *vfio = vfio_device_get_from_vma(vma)
if (!vfio->vma_to_pfn)
return -EINVAL;
vfio->ops->vma_to_pfn(vfio, vma, offset_from_vma_start)
Is fine, why do we need to over complicate?
I don't need to check that the vma belongs to the vfio because it is
the API contract that the caller will guarantee that.
This is the kernel, I can (and do!) check these sorts of things by
code inspection when working on stuff - I can look at every
implementation and every call site to prove these things.
IMHO doing an expensive check like that is a style of defensive
programming the kernel community frowns upon.
vfio-pci needs to validate the vm_pgoff value falls within a BAR region, mask off the index and get the pci_resource_start() for the BAR index.
It needs to do the same thing fault() already does, which is currently one line..
Then we need a solution for how vfio_device_get_from_vma() determines whether to grant a device reference for a given vma, where that vma may map something other than device memory. Are you imagining that we hand out device references independently and vfio_vma_to_pfn() would return an errno for vm_pgoff values that don't map device memory and the IOMMU driver would release the reference?
That seems a reasonable place to start
prevent using unmmap_mapping_range(). The IOMMU backend, once it has a vfio_device via vfio_device_get_from_vma() can know the format of vm_private_data, cast it as a vfio_vma_private_data and directly use base_pfn, accomplishing the big point. They're all operating in the agreed upon vm_private_data format. Thanks,
If we force all drivers into a mandatory (!) vm_private_data format then every driver has to be audited and updated before the new pfn code can be done. If any driver in the future makes a mistake here (and omitting the unique vm_private_data magics is a very easy mistake to make) then it will cause a kernel crash in an obscure scenario. It is the "design the API to be hard to use wrong" philosophy. Jason