Thread (36 messages) 36 messages, 4 authors, 2021-03-05

Re: [RFC PATCH 05/10] vfio: Create a vfio_device from vma lookup

From: Jason Gunthorpe <jgg@nvidia.com>
Date: 2021-03-05 00:36:49
Also in: lkml

On Thu, Mar 04, 2021 at 05:07:31PM -0700, Alex Williamson wrote:
On Thu, 4 Mar 2021 19:16:33 -0400
Jason Gunthorpe [off-list ref] wrote:
quoted
On Thu, Mar 04, 2021 at 02:37:57PM -0700, Alex Williamson wrote:
quoted
Therefore unless a bus driver opts-out by replacing vm_private_data, we
can identify participating vmas by the vm_ops and have flags indicating
if the vma maps device memory such that vfio_get_device_from_vma()
should produce a device reference.  The vfio IOMMU backends would also
consume this, ie. if they get a valid vfio_device from the vma, use the
pfn_base field directly.  vfio_vm_ops would wrap the bus driver
callbacks and provide reference counting on open/close to release this
object.  
quoted
I'm not thrilled with a vfio_device_ops callback plumbed through
vfio-core to do vma-to-pfn translation, so I thought this might be a
better alternative.  Thanks,  
Maybe you could explain why, because I'm looking at this idea and
thinking it looks very complicated compared to a simple driver op
callback?
vfio-core needs to export a vfio_vma_to_pfn() which I think assumes the
caller has already used vfio_device_get_from_vma(), but should still
validate the vma is one from a vfio device before calling this new
vfio_device_ops callback.
Huh? Validate? Why?

Something like this in the IOMMU stuff:

   struct vfio_device *vfio = vfio_device_get_from_vma(vma)

   if (!vfio->vma_to_pfn)
        return -EINVAL;
   vfio->ops->vma_to_pfn(vfio, vma, offset_from_vma_start)

Is fine, why do we need to over complicate?

I don't need to check that the vma belongs to the vfio because it is
the API contract that the caller will guarantee that.

This is the kernel, I can (and do!) check these sorts of things by
code inspection when working on stuff - I can look at every
implementation and every call site to prove these things.

IMHO doing an expensive check like that is a style of defensive
programming the kernel community frowns upon.
vfio-pci needs to validate the vm_pgoff value falls within a BAR
region, mask off the index and get the pci_resource_start() for the
BAR index.
It needs to do the same thing fault() already does, which is currently
one line..
Then we need a solution for how vfio_device_get_from_vma() determines
whether to grant a device reference for a given vma, where that vma may
map something other than device memory. Are you imagining that we hand
out device references independently and vfio_vma_to_pfn() would return
an errno for vm_pgoff values that don't map device memory and the IOMMU
driver would release the reference?
That seems a reasonable place to start
prevent using unmmap_mapping_range().  The IOMMU backend, once it has a
vfio_device via vfio_device_get_from_vma() can know the format of
vm_private_data, cast it as a vfio_vma_private_data and directly use
base_pfn, accomplishing the big point.  They're all operating in the
agreed upon vm_private_data format.  Thanks,
If we force all drivers into a mandatory (!) vm_private_data format
then every driver has to be audited and updated before the new pfn
code can be done. If any driver in the future makes a mistake here
(and omitting the unique vm_private_data magics is a very easy mistake
to make) then it will cause a kernel crash in an obscure scenario.

It is the "design the API to be hard to use wrong" philosophy.

Jason
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help