Thread (268 messages) 268 messages, 15 authors, 2021-06-08

RE: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs

From: Liu, Yi L <hidden>
Date: 2021-04-02 12:47:02
Also in: linux-iommu, lkml

Hi Jason,
From: Jason Gunthorpe <redacted>
Sent: Thursday, April 1, 2021 7:54 PM

On Thu, Apr 01, 2021 at 07:04:01AM +0000, Liu, Yi L wrote:
quoted
After reading your reply in https://lore.kernel.org/linux-
iommu/20210331123801.GD1463678-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org/#t
quoted
So you mean /dev/ioasid FD is per-VM instead of per-ioasid, so above
skeleton
quoted
doesn't suit your idea.
You can do it one PASID per FD or multiple PASID's per FD. Most likely
we will have high numbers of PASID's in a qemu process so I assume
that number of FDs will start to be a contraining factor, thus
multiplexing is reasonable.

It doesn't really change anything about the basic flow.

digging deeply into it either seems like a reasonable choice.
quoted
+-----------------------------+-----------------------------------------------+
|      userspace              |               kernel space                    |
+-----------------------------+-----------------------------------------------+
| ioasid_fd =                 | /dev/ioasid does below:                       |
| open("/dev/ioasid", O_RDWR);|   struct ioasid_fd_ctx {                      |
|                             |       struct list_head ioasid_list;           |
|                             |       ...                                     |
|                             |   } ifd_ctx; // ifd_ctx is per ioasid_fd      |
Sure, possibly an xarray not a list
quoted
+-----------------------------+-----------------------------------------------+
| ioctl(ioasid_fd,            | /dev/ioasid does below:                       |
|       ALLOC, &ioasid);      |   struct ioasid_data {                        |
|                             |       ioasid_t ioasid;                        |
|                             |       struct list_head device_list;           |
|                             |       struct list_head next;                  |
|                             |       ...                                     |
|                             |   } id_data; // id_data is per ioasid         |
|                             |                                               |
|                             |   list_add(&id_data.next,                     |
|                             |            &ifd_ctx.ioasid_list);
|
Yes, this should have a kref in it too
quoted
+-----------------------------+-----------------------------------------------+
| ioctl(device_fd,            | VFIO does below:                              |
|       DEVICE_ALLOW_IOASID,  | 1) get ioasid_fd, check if ioasid_fd is valid |
|       ioasid_fd,            | 2) check if ioasid is allocated from ioasid_fd|
|       ioasid);              | 3) register device/domain info to /dev/ioasid |
|                             |    tracked in id_data.device_list             |
|                             | 4) record the ioasid in VFIO's per-device     |
|                             |    ioasid list for future security check      |
You would provide a function that does steps 1&2 look at eventfd for
instance.

I'm not sure we need to register the device with the ioasid. device
should incr the kref on the ioasid_data at this point.
quoted
+-----------------------------+-----------------------------------------------+
| ioctl(ioasid_fd,            | /dev/ioasid does below:                       |
|       BIND_PGTBL,           | 1) find ioasid's id_data                      |
|       pgtbl_data,           | 2) loop the id_data.device_list and tell iommu|
|       ioasid);              |    give ioasid access to the devices
|
This seems backwards, DEVICE_ALLOW_IOASID should tell the iommu to
give the ioasid to the device.

Here the ioctl should be about assigning a memory map from the the
current
mm_struct to the pasid
quoted
+-----------------------------+-----------------------------------------------+
| ioctl(ioasid_fd,            | /dev/ioasid does below:                       |
|       UNBIND_PGTBL,         | 1) find ioasid's id_data                      |
|       ioasid);              | 2) loop the id_data.device_list and tell iommu|
|                             |    clear ioasid access to the devices         |
Also seems backwards. The ioctl here should be 'destroy ioasid' which
wipes out the page table, halts DMA access and parks the PASID until
all users are done.
quoted
+-----------------------------+-----------------------------------------------+
| ioctl(device_fd,            | VFIO does below:                              |
|      DEVICE_DISALLOW_IOASID,| 1) check if ioasid is associated in VFIO's    |
|       ioasid_fd,            |    device ioasid list.                        |
|       ioasid);              | 2) unregister device/domain info from         |
|                             |    /dev/ioasid, clear in id_data.device_list  |
This should disconnect the iommu and kref_put the ioasid_data
thanks for the comments, updated the skeleton a little bit, accepted your Xarray
and kref suggestion.

+-----------------------------+------------------------------------------------+
|      userspace              |               kernel space                     |
+-----------------------------+------------------------------------------------+
| ioasid_fd =                 | /dev/ioasid does below:                        |
| open("/dev/ioasid", O_RDWR);|   struct ioasid_fd_ctx {                       |
|                             |        struct xarray xa;                       |
|                             |       ...                                      |
|                             |   } ifd_ctx; // ifd_ctx is per ioasid_fd       |
+-----------------------------+------------------------------------------------+
| ioctl(ioasid_fd,            | /dev/ioasid does below:                        |
|       ALLOC, &ioasid);      |   struct ioasid_data {                         |
|                             |       ioasid_t ioasid;                         |
|                             |       refcount_t refs;                         |
|                             |       ...                                      |
|                             |   } id_data; // id_data is per ioasid          |
|                             |                                                |
|                             |   refcount_set(&id_data->refs, 1);             |
+-----------------------------+------------------------------------------------+
| ioctl(device_fd,            | VFIO does below:                               |
|       DEVICE_ALLOW_IOASID,  | 1) get ioasid_fd, check if ioasid_fd is valid  |
|       ioasid_fd,            | 2) check if ioasid is allocated from ioasid_fd |
|       ioasid);              | 3) inr refcount on the ioasid                  |
|                             | 4) tell iommu to give the ioasid to the device |
|                             |    by an iommu API. iommu driver needs to      |
|                             |    store the ioasid/device info in a per       |
|                             |    ioasid allow device list                    |
|                             | 5) record the ioasid in VFIO's per-device      |
|                             |    ioasid list for future security check       |
+-----------------------------+------------------------------------------------+
| ioctl(ioasid_fd,            | /dev/ioasid does below:                        |
|       BIND_PGTBL,           | 1) find ioasid's id_data                       |
|       pgtbl_data,           | 2) call into iommu driver with ioasid, pgtbl   |
|       ioasid);              |    data, iommu driver setup the PASID entry[1] |
|                             |    with the ioasid and the pgtbl_data          |
+-----------------------------+------------------------------------------------+
| ioctl(ioasid_fd,            | /dev/ioasid does below:                        |
|       CAHCE_INVLD,          | 1) find ioasid's id_data                       |
|       inv_data,             | 2) call into iommu driver with ioasid, inv     |
|       ioasid);              |    data, iommu driver invalidates cache        |
+-----------------------------+------------------------------------------------+
| ioctl(ioasid_fd,            | /dev/ioasid does below:                        |
|       UNBIND_PGTBL,         | 1) find ioasid's id_data                       |
|       ioasid);              | 2) call into iommu driver with ioasid, iommu   |
|                             |    driver destroy the PASID entry to block DMA |
|                             |    with this ioasid from device                |
+-----------------------------+------------------------------------------------+
| ioctl(device_fd,            | VFIO does below:                               |
|      DEVICE_DISALLOW_IOASID,| 1) check if ioasid is associated in VFIO's     |
|       ioasid_fd,            |    device ioasid list                          |
|       ioasid);              | 2) tell iommu driver to clear the device from  |
|                             |    its per-ioasid device allow list            |
|                             | 3) put refcount on the ioasid                  |
+-----------------------------+------------------------------------------------+
| ioctl(ioasid_fd,            | /dev/ioasid does below:                        |
|       FREE, ioasid);        |  list_del(&id_data.next);                      |
+-----------------------------+------------------------------------------------+

[1] PASID entry is an entry in a per-device PASID table, this is where the
    page table pointer is stored. e.g. guest cr3 page table pointer. Setup
    PASID entry in a device's PASID table means the access is finally grant
    in IOMMU side.

I kept FREE as it seems to be more symmetric since there is an ALLOC
exposed to userspace. But yeah, I'm open with removing it all the same
if it's really unnecessary per your opinion.

Need your help again on an open.
The major purpose of this series is to support vSVA for guest based on
nested translation. And there is another usage case which is also based
on nested translation but don't have an ioasid. And still, it needs the
bind/unbind_pgtbl, cache_invalidation uAPI. It is gIOVA support. In this
usage, page table is a guest IOVA page table, VMM needs to bind this page
table to host and enabled nested translation, also needs to do cache
invalidation when guest IOVA page table has changes. It's very similar
with the page table bind of vSVA. Only difference is there is no ioasid
in the gIOVA case. Instead, gIOVA case requires device information. But
with regards to the uAPI reusing, need to fit gIOVA to /dev/ioasid model.
As of now, I think it may require user space passes a device FD to the
BIND/UNBIND_PGTBL and CAHCE_INVLD ioctl, then iommu driver can bind the
gIOVA page table to a correct device. Not sure if it looks good. Do you
have any suggestion on it?

[...]
Include a sequence showing how the kvm FD is used to program the
vPASID to pPASID table that ENQCMD uses.

Show how dynamic authorization works based on requests from the
guest's vIOMMU
I'd like to see if the updated skeleton suits your idea first, then
draw a more complete flow to show this.

Regards,
Yi Liu
Jason
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help