Thread (36 messages) 36 messages, 4 authors, 2021-03-05

Re: [RFC PATCH 05/10] vfio: Create a vfio_device from vma lookup

From: Alex Williamson <hidden>
Date: 2021-03-04 21:40:36
Also in: lkml

On Thu, 25 Feb 2021 19:49:49 -0400
Jason Gunthorpe [off-list ref] wrote:
On Thu, Feb 25, 2021 at 03:21:13PM -0700, Alex Williamson wrote:
quoted
This is where it gets tricky.  The vm_pgoff we get from
file_operations.mmap is already essentially describing an offset from
the base of a specific resource.  We could convert that from an absolute
offset to a pfn offset, but it's only the bus driver code (ex.
vfio-pci) that knows how to get the base, assuming there is a single
base per region (we can't assume enough bits per region to store
absolute pfn).  Also note that you're suggesting that all vfio mmaps
would need to standardize on the vfio-pci implementation of region
layouts.  Not that most drivers haven't copied vfio-pci, but we've
specifically avoided exposing it as a fixed uAPI such that we could have
the flexibility for a bus driver to implement regions offsets however
they need.  
Okay, well the bus driver owns the address space and the bus driver is
in control of the vm_pgoff. If it doesn't want to zap then it doesn't
need to do anything

vfio-pci can consistently use the index encoding and be fine
 
quoted
So I'm not really sure what this looks like.  Within vfio-pci we could
keep the index bits in place to allow unmmap_mapping_range() to
selectively zap matching vm_pgoffs but expanding that to a vfio
standard such that the IOMMU backend can also extract a pfn looks very
limiting, or ugly.  Thanks,  
Lets add a op to convert a vma into a PFN range. The map code will
pass the vma to the op and get back a pfn (or failure).

pci will switch the vm_pgoff to an index, find the bar base and
compute the pfn.

It is starting to look more and more like dma buf though
How terrible would it be if vfio-core used a shared vm_private_data
structure and inserted itself into the vm_ops call chain to reference
count that structure?  We already wrap the device file descriptor via
vfio_device_fops.mmap, so we could:

	struct vfio_vma_private_data *priv;

	priv = kzalloc(...
	
	priv->device = device;
	vma->vm_private_data = priv;

	device->ops->mmap(device->device_data, vma);

	if (vma->vm_private_data == priv) {
		priv->vm_ops = vma->vm_ops;
		vma->vm_ops = &vfio_vm_ops;
	} else
		kfree(priv);

Where:

struct vfio_vma_private_data {
	struct vfio_device *device;
	unsigned long pfn_base;
	void *private_data; // maybe not needed
	const struct vm_operations_struct *vm_ops;
	struct kref kref;
	unsigned int release_notification:1;
};

Therefore unless a bus driver opts-out by replacing vm_private_data, we
can identify participating vmas by the vm_ops and have flags indicating
if the vma maps device memory such that vfio_get_device_from_vma()
should produce a device reference.  The vfio IOMMU backends would also
consume this, ie. if they get a valid vfio_device from the vma, use the
pfn_base field directly.  vfio_vm_ops would wrap the bus driver
callbacks and provide reference counting on open/close to release this
object.

I'm not thrilled with a vfio_device_ops callback plumbed through
vfio-core to do vma-to-pfn translation, so I thought this might be a
better alternative.  Thanks,

Alex
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help