Thread (22 messages) 22 messages, 8 authors, 2016-12-12

[RFC v3 00/10] KVM PCIe/MSI passthrough on ARM/ARM64 and IOVA reserved regions

From: eric.auger@redhat.com (Auger Eric)
Date: 2016-12-08 13:37:05
Also in: kvm, linux-iommu, lkml

Hi Robin,

On 08/12/2016 14:14, Robin Murphy wrote:
On 08/12/16 09:36, Auger Eric wrote:
quoted
Hi,

On 15/11/2016 14:09, Eric Auger wrote:
quoted
Following LPC discussions, we now report reserved regions through
iommu-group sysfs reserved_regions attribute file.

While I am respinning this series into v4, here is a tentative summary
of technical topics for which no consensus was reached at this point.

1) Shall we report the usable IOVA range instead of reserved IOVA
   ranges. Not discussed at/after LPC.
   x I currently report reserved regions. Alex expressed the need to
     report the full usable IOVA range instead (x86 min-max range
     minus MSI APIC window). I think this is meaningful for ARM
     too where arm-smmu might not support the full 64b range.
   x Any objection we report the usable IOVA regions instead?
The issue with that is that we can't actually report "the usable
regions" at the moment, as that involves pulling together disjoint
properties of arbitrary hardware unrelated to the IOMMU. We'd be
reporting "the not-definitely-unusable regions, which may have some
unusable holes in them still". That seems like an ABI nightmare - I'd
still much rather say "here are some, but not necessarily all, regions
you definitely can't use", because saying "here are some regions which
you might be able to use most of, probably" is what we're already doing
today, via a single implicit region from 0 to ULONG_MAX ;)

The address space limits are definitely useful to know, but I think it
would be better to expose them separately to avoid the ambiguity. At
worst, I guess it would be reasonable to express the limits via an
"out-of-range" reserved region type for 0 to $base and $top to
ULONG-MAX. To *safely* expose usable regions, we'd have to start out
with a very conservative assumption (e.g. only IOVAs matching physical
RAM), and only expand them once we're sure we can detect every possible
bit of problematic hardware in the system - that's just too limiting to
be useful. And if we expose something knowingly inaccurate, we risk
having another "bogoMIPS in /proc/cpuinfo" ABI burden on our hands, and
nobody wants that...
Makes sense to me. "out-of-range reserved region type for 0 to $base and
$top to ULONG-MAX" can be an alternative to fulfill the requirement.
quoted
2) Shall the kernel check collision with MSI window* when userspace
   calls VFIO_IOMMU_MAP_DMA?
   Joerg/Will No; Alex yes
   *for IOVA regions consumed downstream to the IOMMU: everyone says NO
If we're starting off by having the SMMU drivers expose it as a fake
fixed region, I don't think we need to worry about this yet. We all seem
to agree that as long as we communicate the fixed regions to userspace,
it's then userspace's job to work around them. Let's come back to this
one once we actually get to the point of dynamically sizing and
allocating 'real' MSI remapping region(s).

Ultimately, the kernel *will* police collisions either way, because an
underlying iommu_map() is going to fail if overlapping IOVAs are ever
actually used, so it's really just a question of whether to have a more
user-friendly failure mode.
That's true on ARM but not on x86 where the APIC MSI region is not
mapped I think.
quoted
3) RMRR reporting in the iommu group sysfs? Joerg: yes; Don: no
   My current series does not expose them in iommu group sysfs.
   I understand we can expose the RMRR regions in the iomm group sysfs
   without necessarily supporting RMRR requiring device assignment.
   We can also add this support later.
As you say, reporting them doesn't necessitate allowing device
assignment, and it's information which can already be easily grovelled
out of dmesg (for intel-iommu at least) - there doesn't seem to be any
need to hide them, but the x86 folks can have the final word on that.
agreed

Thanks

Eric
Robin.
quoted
Thanks

Eric

quoted
Reserved regions are populated through the IOMMU get_resv_region callback
(former get_dm_regions), now implemented by amd-iommu, intel-iommu and
arm-smmu.

The intel-iommu reports the [FEE0_0000h - FEF0_000h] MSI window as an
IOMMU_RESV_NOMAP reserved region.

arm-smmu reports the MSI window (arbitrarily located at 0x8000000 and
1MB large) and the PCI host bridge windows.

The series integrates a not officially posted patch from Robin:
"iommu/dma: Allow MSI-only cookies".

This series currently does not address IRQ safety assessment.

Best Regards

Eric

Git: complete series available at
https://github.com/eauger/linux/tree/v4.9-rc5-reserved-rfc-v3

History:
RFC v2 -> v3:
- switch to an iommu-group sysfs API
- use new dummy allocator provided by Robin
- dummy allocator initialized by vfio-iommu-type1 after enumerating
  the reserved regions
- at the moment ARM MSI base address/size is left unchanged compared
  to v2
- we currently report reserved regions and not usable IOVA regions as
  requested by Alex

RFC v1 -> v2:
- fix intel_add_reserved_regions
- add mutex lock/unlock in vfio_iommu_type1


Eric Auger (10):
  iommu/dma: Allow MSI-only cookies
  iommu: Rename iommu_dm_regions into iommu_resv_regions
  iommu: Add new reserved IOMMU attributes
  iommu: iommu_alloc_resv_region
  iommu: Do not map reserved regions
  iommu: iommu_get_group_resv_regions
  iommu: Implement reserved_regions iommu-group sysfs file
  iommu/vt-d: Implement reserved region get/put callbacks
  iommu/arm-smmu: Implement reserved region get/put callbacks
  vfio/type1: Get MSI cookie

 drivers/iommu/amd_iommu.c       |  20 +++---
 drivers/iommu/arm-smmu.c        |  52 +++++++++++++++
 drivers/iommu/dma-iommu.c       | 116 ++++++++++++++++++++++++++-------
 drivers/iommu/intel-iommu.c     |  50 ++++++++++----
 drivers/iommu/iommu.c           | 141 ++++++++++++++++++++++++++++++++++++----
 drivers/vfio/vfio_iommu_type1.c |  26 ++++++++
 include/linux/dma-iommu.h       |   7 ++
 include/linux/iommu.h           |  49 ++++++++++----
 8 files changed, 391 insertions(+), 70 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo at vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help