Re: [RFC v2] /dev/iommu uAPI proposal
From: Eric Auger <eric.auger@redhat.com>
Date: 2021-08-10 07:17:32
Also in:
linux-iommu, lkml
Hi Kevin, On 8/5/21 2:36 AM, Tian, Kevin wrote:
quoted
From: Eric Auger <eric.auger@redhat.com> Sent: Wednesday, August 4, 2021 11:59 PM[...]quoted
quoted
1.2. Attach Device to I/O address space +++++++++++++++++++++++++++++++++++++++ Device attach/bind is initiated through passthrough framework uAPI. Device attaching is allowed only after a device is successfully bound to the IOMMU fd. User should provide a device cookie when binding the device through VFIO uAPI. This cookie is used when the user queries device capability/format, issues per-device iotlb invalidation and receives per-device I/O page fault data via IOMMU fd. Successful binding puts the device into a security context which isolates its DMA from the rest system. VFIO should not allow user to access thes/from the rest system/from the rest of the systemquoted
device before binding is completed. Similarly, VFIO should prevent the user from unbinding the device before user access is withdrawn.With Intel scalable IOV, I understand you could assign an RID/PASID to one VM and another one to another VM (which is not the case for ARM). Is it a targetted use case?How would it be handled? Is it related to the sub-groups evoked hereafter?Not related to sub-group. Each mdev is bound to the IOMMU fd respectively with the defPASID which represents the mdev.
But how does it work in term of security. The device (RID) is bound to
an IOMMU fd. But then each SID/PASID may be working for a different VM.
How do you detect this is safe as each SID can work safely for a
different VM versus the ARM case where it is not possible.
1.3 says
"
1) A successful binding call for the first device in the group creates
the security context for the entire group, by:
"
What does it mean for above scalable IOV use case?
quoted
Actually all devices bound to an IOMMU fd should have the same parent I/O address space or root address space, am I correct? If so, maybe add this comment explicitly?in most cases yes but it's not mandatory. multiple roots are allowed (e.g. with vIOMMU but no nesting).
OK, right, this corresponds to example 4.2 for example. I misinterpreted the notion of security context. The security context does not match the IOMMU fd but is something implicit created on 1st device binding.
[...]quoted
quoted
The device in the /dev/iommu context always refers to a physical one (pdev) which is identifiable via RID. Physically each pdev can support one default I/O address space (routed via RID) and optionally multiple non-default I/O address spaces (via RID+PASID). The device in VFIO context is a logic concept, being either a physical device (pdev) or mediated device (mdev or subdev). Each vfio device is represented by RID+cookie in IOMMU fd. User is allowed to create one default I/O address space (routed by vRID from user p.o.v) per each vfio_device.The concept of default address space is not fully clear for me. I currently understand this is a root address space (not nesting). Is that coorect.This may need clarification.w/o PASID there is only one address space (either GPA or GIOVA) per device. This one is called default. whether it's root is orthogonal (e.g. GIOVA could be also nested) to the device view of this space. w/ PASID additional address spaces can be targeted by the device. those are called non-default. I could also rename default to RID address space and non-default to RID+PASID address space if doing so makes it clearer.
Yes I think it is worth having a kind of glossary and defining root as, default as as you clearly defined child/parent.
quoted
quoted
VFIO decides the routing information for this default space based on device type: 1) pdev, routed via RID; 2) mdev/subdev with IOMMU-enforced DMA isolation, routed via the parent's RID plus the PASID marking this mdev; 3) a purely sw-mediated device (sw mdev), no routing required i.e. no need to install the I/O page table in the IOMMU. sw mdev just uses the metadata to assist its internal DMA isolation logic on top of the parent's IOMMU page table;Maybe you should introduce this concept of SW mediated device earlier because it seems to special case the way the attach behaves. I am especially refering to "Successful attaching activates an I/O address space in the IOMMU, if the device is not purely software mediated"makes sense.quoted
quoted
In addition, VFIO may allow user to create additional I/O address spaces on a vfio_device based on the hardware capability. In such case the user has its own view of the virtual routing information (vPASID) when marking these non-default address spaces.I do not catch what does mean "marking these non default address space".as explained above, those non-default address spaces are identified/routed via PASID.quoted
quoted
1.3. Group isolation ++++++++++++++++++++[...]quoted
quoted
1) A successful binding call for the first device in the group creates the security context for the entire group, by: * Verifying group viability in a similar way as VFIO does; * Calling IOMMU-API to move the group into a block-dma state, which makes all devices in the group attached to an block-dma domain with an empty I/O page table;this block-dma state/domain would deserve to be better defined (I know you already evoked it in 1.1 with the dma mapping protocol though) activates an empty I/O page table in the IOMMU (if the device is not purely SW mediated)?sure. some explanations are scattered in following paragraph, but I can consider to further clarify it.quoted
How does that relate to the default address space? Is it the same?different. this block-dma domain doesn't hold any valid mapping. The default address space is represented by a normal unmanaged domain. the ioasid attaching operation will detach the device from the block-dma domain and then attach it to the target ioasid.
OK Thanks Eric
quoted
quoted
2. uAPI Proposal ----------------------[...]quoted
quoted
/* * Allocate an IOASID. * * IOASID is the FD-local software handle representing an I/O address * space. Each IOASID is associated with a single I/O page table. User * must call this ioctl to get an IOASID for every I/O address space that is * intended to be tracked by the kernel. * * User needs to specify the attributes of the IOASID and associated * I/O page table format information according to one or multiple devices * which will be attached to this IOASID right after. The I/O page table * is activated in the IOMMU when it's attached by a device. Incompatible.. if not SW mediatedquoted
* format between device and IOASID will lead to attaching failure. * * The root IOASID should always have a kernel-managed I/O page * table for safety. Locked page accounting is also conducted on the root.The definition of root IOASID is not easily found in this spec. Maybe this would deserve some clarification.make sense. and thanks for other typo-related comments. Thanks Kevin