Thread (31 messages) 31 messages, 9 authors, 2021-12-09

Re: [RFC v16 1/9] iommu: Introduce attach/detach_pasid_table API

From: Jean-Philippe Brucker <hidden>
Date: 2021-12-08 17:21:07
Also in: kvmarm, linux-iommu, lkml

On Wed, Dec 08, 2021 at 08:56:16AM -0400, Jason Gunthorpe wrote:
From a progress perspective I would like to start with simple 'page
tables in userspace', ie no PASID in this step.

'page tables in userspace' means an iommufd ioctl to create an
iommu_domain where the IOMMU HW is directly travesering a
device-specific page table structure in user space memory. All the HW
today implements this by using another iommu_domain to allow the IOMMU
HW DMA access to user memory - ie nesting or multi-stage or whatever.

This would come along with some ioctls to invalidate the IOTLB.

I'm imagining this step as a iommu_group->op->create_user_domain()
driver callback which will create a new kind of domain with
domain-unique ops. Ie map/unmap related should all be NULL as those
are impossible operations.

From there the usual struct device (ie RID) attach/detatch stuff needs
to take care of routing DMAs to this iommu_domain.

Step two would be to add the ability for an iommufd using driver to
request that a RID&PASID is connected to an iommu_domain. This
connection can be requested for any kind of iommu_domain, kernel owned
or user owned.

I don't quite have an answer how exactly the SMMUv3 vs Intel
difference in PASID routing should be resolved.
In SMMUv3 the user pgd is always stored in the PASID table (actually
called "context descriptor table" but I want to avoid confusion with the
VT-d "context table"). And to access the PASID table, the SMMUv3 first
translate its GPA into a PA using the stage-2 page table. For userspace to
pass individual pgds to the kernel, as opposed to passing whole PASID
tables, the host kernel needs to reserve GPA space and map it in stage-2,
so it can store the PASID table in there. Userspace manages GPA space.

This would be easy for a single pgd. In this case the PASID table has a
single entry and userspace could just pass one GPA page during
registration. However it isn't easily generalized to full PASID support,
because managing a multi-level PASID table will require runtime GPA
allocation, and that API is awkward. That's why we opted for "attach PASID
table" operation rather than "attach page table" (back then the choice was
easy since VT-d used the same concept).

So I think the simplest way to support nesting is still to have separate
modes of operations depending on the hardware.

Thanks,
Jean
to get answers I'm hoping to start building some sketch RFCs for these
different things on iommufd, hopefully in January. I'm looking at user
page tables, PASID, dirty tracking and userspace IO fault handling as
the main features iommufd must tackle.

The purpose of the sketches would be to validate that the HW features
we want to exposed can work will with the choices the base is making.

Jason
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help