Re: [RFC] /dev/ioasid uAPI proposal
From: Jason Gunthorpe <jgg@nvidia.com>
Date: 2021-06-01 20:28:40
Also in:
linux-iommu, lkml
On Tue, Jun 01, 2021 at 07:01:57AM +0000, Tian, Kevin wrote:
quoted
From: Jason Gunthorpe <jgg@nvidia.com> Sent: Saturday, May 29, 2021 4:03 AM On Thu, May 27, 2021 at 07:58:12AM +0000, Tian, Kevin wrote:quoted
/dev/ioasid provides an unified interface for managing I/O page tables for devices assigned to userspace. Device passthrough frameworks (VFIO,vDPA,quoted
etc.) are expected to use this interface instead of creating their own logic to isolate untrusted device DMAs initiated by userspace.It is very long, but I think this has turned out quite well. It certainly matches the basic sketch I had in my head when we were talking about how to create vDPA devices a few years ago. When you get down to the operations they all seem pretty common sense and straightfoward. Create an IOASID. Connect to a device. Fill the IOASID with pages somehow. Worry about PASID labeling. It really is critical to get all the vendor IOMMU people to go over it and see how their HW features map into this.Agree. btw I feel it might be good to have several design opens centrally discussed after going through all the comments. Otherwise they may be buried in different sub-threads and potentially with insufficient care (especially for people who haven't completed the reading). I summarized five opens here, about: 1) Finalizing the name to replace /dev/ioasid; 2) Whether one device is allowed to bind to multiple IOASID fd's; 3) Carry device information in invalidation/fault reporting uAPI; 4) What should/could be specified when allocating an IOASID; 5) The protocol between vfio group and kvm; For 1), two alternative names are mentioned: /dev/iommu and /dev/ioas. I don't have a strong preference and would like to hear votes from all stakeholders. /dev/iommu is slightly better imho for two reasons. First, per AMD's presentation in last KVM forum they implement vIOMMU in hardware thus need to support user-managed domains. An iommu uAPI notation might make more sense moving forward. Second, it makes later uAPI naming easier as 'IOASID' can be always put as an object, e.g. IOMMU_ALLOC_IOASID instead of IOASID_ALLOC_IOASID. :)
I think two years ago I suggested /dev/iommu and it didn't go very far at the time. We've also talked about this as /dev/sva for a while and now /dev/ioasid I think /dev/iommu is fine, and call the things inside them IOAS objects. Then we don't have naming aliasing with kernel constructs.
For 2), Jason prefers to not blocking it if no kernel design reason. If one device is allowed to bind multiple IOASID fd's, the main problem is about cross-fd IOASID nesting, e.g. having gpa_ioasid created in fd1 and giova_ioasid created in fd2 and then nesting them together (and
Huh? This can't happen Creating an IOASID is an operation on on the /dev/ioasid FD. We won't provide APIs to create a tree of IOASID's outside a single FD container. If a device can consume multiple IOASID's it doesn't care how many or what /dev/ioasid FDs they come from.
To the other end there was also thought whether we should make a single I/O address space per IOASID fd. This was discussed in previous thread that #fd's are insufficient to afford theoretical 1M's address spaces per device. But let's have another revisit and draw a clear conclusion whether this option is viable.
I had remarks on this, I think per-fd doesn't work
This implies that VFIO_BOUND_IOASID will be extended to allow user specify a device label. This label will be recorded in /dev/iommu to serve per-device invalidation request from and report per-device fault data to the user.
I wonder which of the user providing a 64 bit cookie or the kernel returning a small IDA is the best choice here? Both have merits depending on what qemu needs..
In addition, vPASID (if provided by user) will be also recorded in /dev/iommu so vPASID<->pPASID conversion is conducted properly. e.g. invalidation request from user carries a vPASID which must be converted into pPASID before calling iommu driver. Vice versa for raw fault data which carries pPASID while the user expects a vPASID.
I don't think the PASID should be returned at all. It should return the IOASID number in the FD and/or a u64 cookie associated with that IOASID. Userspace should figure out what the IOASID & device combination means.
Seems to close this design open we have to touch the kAPI design. and Joerg's input is highly appreciated here.
uAPI is forever, the kAPI is constantly changing. I always dislike warping the uAPI based on the current kAPI situation. Jason