Re: [RFC PATCH 05/10] vfio: Create a vfio_device from vma lookup
From: Alex Williamson <hidden>
Date: 2021-03-04 21:40:36
Also in:
lkml
On Thu, 25 Feb 2021 19:49:49 -0400 Jason Gunthorpe [off-list ref] wrote:
On Thu, Feb 25, 2021 at 03:21:13PM -0700, Alex Williamson wrote:quoted
This is where it gets tricky. The vm_pgoff we get from file_operations.mmap is already essentially describing an offset from the base of a specific resource. We could convert that from an absolute offset to a pfn offset, but it's only the bus driver code (ex. vfio-pci) that knows how to get the base, assuming there is a single base per region (we can't assume enough bits per region to store absolute pfn). Also note that you're suggesting that all vfio mmaps would need to standardize on the vfio-pci implementation of region layouts. Not that most drivers haven't copied vfio-pci, but we've specifically avoided exposing it as a fixed uAPI such that we could have the flexibility for a bus driver to implement regions offsets however they need.Okay, well the bus driver owns the address space and the bus driver is in control of the vm_pgoff. If it doesn't want to zap then it doesn't need to do anything vfio-pci can consistently use the index encoding and be finequoted
So I'm not really sure what this looks like. Within vfio-pci we could keep the index bits in place to allow unmmap_mapping_range() to selectively zap matching vm_pgoffs but expanding that to a vfio standard such that the IOMMU backend can also extract a pfn looks very limiting, or ugly. Thanks,Lets add a op to convert a vma into a PFN range. The map code will pass the vma to the op and get back a pfn (or failure). pci will switch the vm_pgoff to an index, find the bar base and compute the pfn. It is starting to look more and more like dma buf though
How terrible would it be if vfio-core used a shared vm_private_data
structure and inserted itself into the vm_ops call chain to reference
count that structure? We already wrap the device file descriptor via
vfio_device_fops.mmap, so we could:
struct vfio_vma_private_data *priv;
priv = kzalloc(...
priv->device = device;
vma->vm_private_data = priv;
device->ops->mmap(device->device_data, vma);
if (vma->vm_private_data == priv) {
priv->vm_ops = vma->vm_ops;
vma->vm_ops = &vfio_vm_ops;
} else
kfree(priv);
Where:
struct vfio_vma_private_data {
struct vfio_device *device;
unsigned long pfn_base;
void *private_data; // maybe not needed
const struct vm_operations_struct *vm_ops;
struct kref kref;
unsigned int release_notification:1;
};
Therefore unless a bus driver opts-out by replacing vm_private_data, we
can identify participating vmas by the vm_ops and have flags indicating
if the vma maps device memory such that vfio_get_device_from_vma()
should produce a device reference. The vfio IOMMU backends would also
consume this, ie. if they get a valid vfio_device from the vma, use the
pfn_base field directly. vfio_vm_ops would wrap the bus driver
callbacks and provide reference counting on open/close to release this
object.
I'm not thrilled with a vfio_device_ops callback plumbed through
vfio-core to do vma-to-pfn translation, so I thought this might be a
better alternative. Thanks,
Alex