Thread (233 messages) 233 messages, 15 authors, 2021-10-28

Re: [RFC] /dev/ioasid uAPI proposal

From: Jason Gunthorpe <jgg@nvidia.com>
Date: 2021-06-01 20:28:40
Also in: linux-iommu, lkml

On Tue, Jun 01, 2021 at 07:01:57AM +0000, Tian, Kevin wrote:
quoted
From: Jason Gunthorpe <jgg@nvidia.com>
Sent: Saturday, May 29, 2021 4:03 AM

On Thu, May 27, 2021 at 07:58:12AM +0000, Tian, Kevin wrote:
quoted
/dev/ioasid provides an unified interface for managing I/O page tables for
devices assigned to userspace. Device passthrough frameworks (VFIO,
vDPA,
quoted
etc.) are expected to use this interface instead of creating their own logic to
isolate untrusted device DMAs initiated by userspace.
It is very long, but I think this has turned out quite well. It
certainly matches the basic sketch I had in my head when we were
talking about how to create vDPA devices a few years ago.

When you get down to the operations they all seem pretty common sense
and straightfoward. Create an IOASID. Connect to a device. Fill the
IOASID with pages somehow. Worry about PASID labeling.

It really is critical to get all the vendor IOMMU people to go over it
and see how their HW features map into this.
Agree. btw I feel it might be good to have several design opens 
centrally discussed after going through all the comments. Otherwise 
they may be buried in different sub-threads and potentially with 
insufficient care (especially for people who haven't completed the
reading).

I summarized five opens here, about:

1)  Finalizing the name to replace /dev/ioasid;
2)  Whether one device is allowed to bind to multiple IOASID fd's;
3)  Carry device information in invalidation/fault reporting uAPI;
4)  What should/could be specified when allocating an IOASID;
5)  The protocol between vfio group and kvm;

For 1), two alternative names are mentioned: /dev/iommu and 
/dev/ioas. I don't have a strong preference and would like to hear 
votes from all stakeholders. /dev/iommu is slightly better imho for 
two reasons. First, per AMD's presentation in last KVM forum they 
implement vIOMMU in hardware thus need to support user-managed 
domains. An iommu uAPI notation might make more sense moving 
forward. Second, it makes later uAPI naming easier as 'IOASID' can 
be always put as an object, e.g. IOMMU_ALLOC_IOASID instead of 
IOASID_ALLOC_IOASID. :)
I think two years ago I suggested /dev/iommu and it didn't go very far
at the time. We've also talked about this as /dev/sva for a while and
now /dev/ioasid

I think /dev/iommu is fine, and call the things inside them IOAS
objects.

Then we don't have naming aliasing with kernel constructs.
 
For 2), Jason prefers to not blocking it if no kernel design reason. If 
one device is allowed to bind multiple IOASID fd's, the main problem
is about cross-fd IOASID nesting, e.g. having gpa_ioasid created in fd1 
and giova_ioasid created in fd2 and then nesting them together (and
Huh? This can't happen

Creating an IOASID is an operation on on the /dev/ioasid FD. We won't
provide APIs to create a tree of IOASID's outside a single FD container.

If a device can consume multiple IOASID's it doesn't care how many or
what /dev/ioasid FDs they come from.
To the other end there was also thought whether we should make
a single I/O address space per IOASID fd. This was discussed in previous
thread that #fd's are insufficient to afford theoretical 1M's address
spaces per device. But let's have another revisit and draw a clear
conclusion whether this option is viable.
I had remarks on this, I think per-fd doesn't work
 
This implies that VFIO_BOUND_IOASID will be extended to allow user
specify a device label. This label will be recorded in /dev/iommu to
serve per-device invalidation request from and report per-device 
fault data to the user.
I wonder which of the user providing a 64 bit cookie or the kernel
returning a small IDA is the best choice here? Both have merits
depending on what qemu needs..
In addition, vPASID (if provided by user) will
be also recorded in /dev/iommu so vPASID<->pPASID conversion 
is conducted properly. e.g. invalidation request from user carries
a vPASID which must be converted into pPASID before calling iommu
driver. Vice versa for raw fault data which carries pPASID while the
user expects a vPASID.
I don't think the PASID should be returned at all. It should return
the IOASID number in the FD and/or a u64 cookie associated with that
IOASID. Userspace should figure out what the IOASID & device
combination means.
Seems to close this design open we have to touch the kAPI design. and 
Joerg's input is highly appreciated here.
uAPI is forever, the kAPI is constantly changing. I always dislike
warping the uAPI based on the current kAPI situation.

Jason
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help