RE: [PATCH V4 05/18] iommu/ioasid: Redefine IOASID set and allocation APIs
From: Tian, Kevin <hidden>
Date: 2021-05-12 00:40:18
Also in:
linux-iommu, lkml
From: Jason Gunthorpe <redacted> Sent: Wednesday, May 12, 2021 8:25 AM On Wed, May 12, 2021 at 12:21:24AM +0000, Tian, Kevin wrote:quoted
quoted
Basically each RID knows based on its kernel drivers if it is a local or global RID and the ioasid knob can further fine tune this for any other specialty cases.It's fine if you insist on this way. Then we leave it to userspace to ensure same split range is used across devices when vIOMMU is concerned.I'm still confused why there is a split range needed.
a device could support both ENQCMD and non-ENQCMD submissions. for ENQCMD path, CPU provides a PASID translation mechanism (from guest PASID to host PASID) for non-ENQCMD path, guest driver directly programs untranslated guest PASID to the device MMIO register. the host kernel only setups host PASID entry which is hwid for a said ioasid page table. If we don't split range, we have to assume guest PASID == host PASID otherwise non-ENQCMD path will fail. But expose host PASID to guest breaks migration. If we want to allow migration, then need support guest PASID != host PASID and make sure both entries point to the same page table so ENQCMD (host PASID) and non-ENQCMD (guest PASID) can both work. It requires range split to avoid conflict between host/guest PASIDs in the same space.
quoted
Please note such range split has to be enforced through vIOMMU which (e.g. on VT-d) includes a register to report available PASID space size (applying to all devices behind this vIOMMU) to the guest. The kernel just follows per-RID split info. If anything broken, the userspace just shoots its own foot.Is it because this specific vIOMMU protocol is limiting things?
When range split is enabled, we need a way to tell the guest about the local range size so guest PASID is allocated only within this range. Then we use vIOMMU to expose such information.
quoted
quoted
quoted
quoted
It does need some user visible difference because SIOV/mdev is not migratable. Only the kernel can select a PASID, userspace (and hence the guest) shouldn't have the option to force a specific PASID as the PASID space is shared across the entire RID to all VMs using the mdev.not migratable only when you choose exposing host-allocated PASID into guest. However in the entire this proposal we actually virtualize PASIDs, letting the guest manage its own PASID space in all scenariosPASID cannot be virtualized without also using ENQCMD. A mdev that is using PASID without ENQCMD is non-migratable and this needs to be make visiable in the uAPI.No. without ENQCMD the PASID must be programmed to a mdev MMIO register. This operation is mediated then mdev driver can translate the PASID from virtual to real.That is probably unworkable with real devices, but if you do this you need to explicitly expose the vPASID to the mdev API somehow, and still the device needs to declare if it supports this, and devices that don't should still work in a non-migratable mode.
It's not necessary. For real devices we use alias mapping for both guest/host PASID as explained above. Then we can have the guest to always register its vPASID with ioasid (just like map/unmap GPA to HVA), and then let host drivers to figure out whether that vPASID can be used as a real hwid. When it's considered virtual and a real hwid is allocated by the host, the mapping is saved under this ioasid to be queried by device drivers if translation is required. From this angle, the previous IOASID_SET_HWID possibly should be renamed to IOASID_SET_USER_HWID. Thanks Kevin