Re: [RFC v2] /dev/iommu uAPI proposal

From: Eric Auger <eric.auger@redhat.com>
Date: 2021-08-10 07:17:32
Also in: linux-iommu, lkml

Hi Kevin,

On 8/5/21 2:36 AM, Tian, Kevin wrote:

quoted

From: Eric Auger <eric.auger@redhat.com>
Sent: Wednesday, August 4, 2021 11:59 PM

[...]

quoted

1.2. Attach Device to I/O address space
+++++++++++++++++++++++++++++++++++++++

Device attach/bind is initiated through passthrough framework uAPI.

Device attaching is allowed only after a device is successfully bound to
the IOMMU fd. User should provide a device cookie when binding the
device through VFIO uAPI. This cookie is used when the user queries
device capability/format, issues per-device iotlb invalidation and
receives per-device I/O page fault data via IOMMU fd.

Successful binding puts the device into a security context which isolates
its DMA from the rest system. VFIO should not allow user to access the

s/from the rest system/from the rest of the system

quoted

device before binding is completed. Similarly, VFIO should prevent the
user from unbinding the device before user access is withdrawn.

With Intel scalable IOV, I understand you could assign an RID/PASID to
one VM and another one to another VM (which is not the case for ARM). Is
it a targetted use case?How would it be handled? Is it related to the
sub-groups evoked hereafter?

Not related to sub-group. Each mdev is bound to the IOMMU fd respectively
with the defPASID which represents the mdev.

But how does it work in term of security. The device (RID) is bound to
an IOMMU fd. But then each SID/PASID may be working for a different VM.
How do you detect this is safe as each SID can work safely for a
different VM versus the ARM case where it is not possible.

1.3 says
"

1)  A successful binding call for the first device in the group creates
    the security context for the entire group, by:
"
What does it mean for above scalable IOV use case?

quoted

Actually all devices bound to an IOMMU fd should have the same parent
I/O address space or root address space, am I correct? If so, maybe add
this comment explicitly?

in most cases yes but it's not mandatory. multiple roots are allowed
(e.g. with vIOMMU but no nesting).

OK, right, this corresponds to example 4.2 for example. I misinterpreted
the notion of security context. The security context does not match the
IOMMU fd but is something implicit created on 1st device binding.

[...]

quoted

The device in the /dev/iommu context always refers to a physical one
(pdev) which is identifiable via RID. Physically each pdev can support
one default I/O address space (routed via RID) and optionally multiple
non-default I/O address spaces (via RID+PASID).

The device in VFIO context is a logic concept, being either a physical
device (pdev) or mediated device (mdev or subdev). Each vfio device
is represented by RID+cookie in IOMMU fd. User is allowed to create
one default I/O address space (routed by vRID from user p.o.v) per
each vfio_device.

The concept of default address space is not fully clear for me. I
currently understand this is a
root address space (not nesting). Is that coorect.This may need
clarification.

w/o PASID there is only one address space (either GPA or GIOVA)
per device. This one is called default. whether it's root is orthogonal
(e.g. GIOVA could be also nested) to the device view of this space.

w/ PASID additional address spaces can be targeted by the device.
those are called non-default.

I could also rename default to RID address space and non-default to 
RID+PASID address space if doing so makes it clearer.

Yes I think it is worth having a kind of glossary and defining root as,
default as as you clearly defined child/parent.

quoted

VFIO decides the routing information for this default
space based on device type:

1)  pdev, routed via RID;

2)  mdev/subdev with IOMMU-enforced DMA isolation, routed via
    the parent's RID plus the PASID marking this mdev;

3)  a purely sw-mediated device (sw mdev), no routing required i.e. no
    need to install the I/O page table in the IOMMU. sw mdev just uses
    the metadata to assist its internal DMA isolation logic on top of
    the parent's IOMMU page table;

Maybe you should introduce this concept of SW mediated device earlier
because it seems to special case the way the attach behaves. I am
especially refering to

"Successful attaching activates an I/O address space in the IOMMU, if the
device is not purely software mediated"

makes sense.

quoted

In addition, VFIO may allow user to create additional I/O address spaces
on a vfio_device based on the hardware capability. In such case the user
has its own view of the virtual routing information (vPASID) when marking
these non-default address spaces.

I do not catch what does mean "marking these non default address space".

as explained above, those non-default address spaces are identified/routed
via PASID.

quoted

1.3. Group isolation
++++++++++++++++++++

[...]

quoted

1)  A successful binding call for the first device in the group creates
    the security context for the entire group, by:

    * Verifying group viability in a similar way as VFIO does;

    * Calling IOMMU-API to move the group into a block-dma state,
      which makes all devices in the group attached to an block-dma
      domain with an empty I/O page table;

this block-dma state/domain would deserve to be better defined (I know
you already evoked it in 1.1 with the dma mapping protocol though)
activates an empty I/O page table in the IOMMU (if the device is not
purely SW mediated)?

sure. some explanations are scattered in following paragraph, but I
can consider to further clarify it.

quoted

How does that relate to the default address space? Is it the same?

different. this block-dma domain doesn't hold any valid mapping. The
default address space is represented by a normal unmanaged domain.
the ioasid attaching operation will detach the device from the block-dma
domain and then attach it to the target ioasid.

OK

Thanks

Eric

quoted

2. uAPI Proposal
----------------------

[...]

quoted

/*
  * Allocate an IOASID.
  *
  * IOASID is the FD-local software handle representing an I/O address
  * space. Each IOASID is associated with a single I/O page table. User
  * must call this ioctl to get an IOASID for every I/O address space that is
  * intended to be tracked by the kernel.
  *
  * User needs to specify the attributes of the IOASID and associated
  * I/O page table format information according to one or multiple devices
  * which will be attached to this IOASID right after. The I/O page table
  * is activated in the IOMMU when it's attached by a device. Incompatible

.. if not SW mediated

quoted

  * format between device and IOASID will lead to attaching failure.
  *
  * The root IOASID should always have a kernel-managed I/O page
  * table for safety. Locked page accounting is also conducted on the root.

The definition of root IOASID is not easily found in this spec. Maybe
this would deserve some clarification.

make sense.

and thanks for other typo-related comments.

Thanks
Kevin

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help