Thread (35 messages) 35 messages, 8 authors, 2021-01-22

RE: [PATCH v9 03/10] iommu: Separate IOMMU_DEV_FEAT_IOPF from IOMMU_DEV_FEAT_SVA

From: "Tian, Kevin" <kevin.tian@intel.com>
Date: 2021-01-13 08:11:38
Also in: linux-acpi, linux-devicetree, linux-iommu

From: Lu Baolu <baolu.lu@linux.intel.com>
Sent: Wednesday, January 13, 2021 10:50 AM

Hi Jean,

On 1/12/21 5:16 PM, Jean-Philippe Brucker wrote:
quoted
Hi Baolu,

On Tue, Jan 12, 2021 at 12:31:23PM +0800, Lu Baolu wrote:
quoted
Hi Jean,

On 1/8/21 10:52 PM, Jean-Philippe Brucker wrote:
quoted
Some devices manage I/O Page Faults (IOPF) themselves instead of
relying
quoted
quoted
quoted
on PCIe PRI or Arm SMMU stall. Allow their drivers to enable SVA without
mandating IOMMU-managed IOPF. The other device drivers now need to
first
quoted
quoted
quoted
enable IOMMU_DEV_FEAT_IOPF before enabling
IOMMU_DEV_FEAT_SVA.
quoted
quoted
quoted
Signed-off-by: Jean-Philippe Brucker <redacted>
---
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Lu Baolu <baolu.lu@linux.intel.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zhangfei Gao <zhangfei.gao@linaro.org>
Cc: Zhou Wang <wangzhou1@hisilicon.com>
---
   include/linux/iommu.h | 20 +++++++++++++++++---
   1 file changed, 17 insertions(+), 3 deletions(-)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 583c734b2e87..701b2eeb0dc5 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -156,10 +156,24 @@ struct iommu_resv_region {
   	enum iommu_resv_type	type;
   };
-/* Per device IOMMU features */
+/**
+ * enum iommu_dev_features - Per device IOMMU features
+ * @IOMMU_DEV_FEAT_AUX: Auxiliary domain feature
+ * @IOMMU_DEV_FEAT_SVA: Shared Virtual Addresses
+ * @IOMMU_DEV_FEAT_IOPF: I/O Page Faults such as PRI or Stall.
Generally using
quoted
quoted
quoted
+ *			 %IOMMU_DEV_FEAT_SVA
requires %IOMMU_DEV_FEAT_IOPF, but
quoted
quoted
quoted
+ *			 some devices manage I/O Page Faults themselves
instead
quoted
quoted
quoted
+ *			 of relying on the IOMMU. When supported, this
feature
quoted
quoted
quoted
+ *			 must be enabled before and disabled after
+ *			 %IOMMU_DEV_FEAT_SVA.
Is this only for SVA? We may see more scenarios of using IOPF. For
example, when passing through devices to user level, the user's pages
could be managed dynamically instead of being allocated and pinned
statically.
Hm, isn't that precisely what SVA does?  I don't understand the
difference. That said FEAT_IOPF doesn't have to be only for SVA. It could
later be used as a prerequisite some another feature. For special cases
device drivers can always use the iommu_register_device_fault_handler()
API and handle faults themselves.
 From the perspective of IOMMU, there is a little difference between
these two. For SVA, the page table is from CPU side, so IOMMU only needs
to call handle_mm_fault(); For above pass-through case, the page table
is from IOMMU side, so the device driver (probably VFIO) needs to
register a fault handler and call iommu_map/unmap() to serve the page
faults.

If we think about the nested mode (or dual-stage translation), it's more
complicated since the kernel (probably VFIO) handles the second level
page faults, while the first level page faults need to be delivered to
user-level guest. Obviously, this hasn't been fully implemented in any
IOMMU driver.
Thinking more the confusion might come from the fact that we mixed
hardware capability with software capability. IOMMU_FEAT describes
the hardware capability. When FEAT_IOPF is set, it purely means whatever
page faults that are enabled by the software are routed through the IOMMU.
Nothing more. Then the software (IOMMU drivers) may choose to support
only limited faulting scenarios and then evolve to support more complex 
usages gradually. For example, the intel-iommu driver only supports 1st-level
fault (thus SVA) for now, while FEAT_IOPF as a separate feature may give the
impression that 2nd-level faults are also allowed. From this angle once we 
start to separate page fault from SVA, we may also need a way to report 
the software capability (e.g. a set of faulting categories) and also extend
iommu_register_device_fault_handler to allow specifying which 
category is enabled respectively. The example categories could be:

- IOPF_BIND, for page tables which are bound/linked to the IOMMU. 
Apply to bare metal SVA and guest SVA case;
- IOPF_MAP, for page tables which are managed through explicit IOMMU
map interfaces. Apply to removing VFIO pinning restriction;

Both categories can be enabled together in nested translation, with 
additional information provided to differentiate them in fault information.
Using paging/staging level doesn't make much sense as it's IOMMU driver's 
internal knowledge, e.g. VT-d driver plans to use 1st level for GPA if no 
nesting and then turn to 2nd level when nesting is enabled.

Thanks
Kevin
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help