Re: [PATCH v3 3/7] PCI: OF: Allow endpoints to bypass the iommu
From: Robin Murphy <robin.murphy@arm.com>
Date: 2018-10-18 10:47:18
On 17/10/18 16:14, Michael S. Tsirkin wrote:
On Mon, Oct 15, 2018 at 08:46:41PM +0100, Jean-philippe Brucker wrote:quoted
[Replying with my personal address because we're having SMTP issues] On 15/10/2018 11:52, Michael S. Tsirkin wrote:quoted
On Fri, Oct 12, 2018 at 02:41:59PM -0500, Bjorn Helgaas wrote:quoted
s/iommu/IOMMU/ in subject On Fri, Oct 12, 2018 at 03:59:13PM +0100, Jean-Philippe Brucker wrote:quoted
Using the iommu-map binding, endpoints in a given PCI domain can be managed by different IOMMUs. Some virtual machines may allow a subset of endpoints to bypass the IOMMU. In some case the IOMMU itself is presenteds/case/cases/quoted
as a PCI endpoint (e.g. AMD IOMMU and virtio-iommu). Currently, when a PCI root complex has an iommu-map property, the driver requires all endpoints to be described by the property. Allow the iommu-map property to have gaps.I'm not an IOMMU or virtio expert, so it's not obvious to me why it is safe to allow devices to bypass the IOMMU. Does this mean a typo in iommu-map could inadvertently allow devices to bypass it?Thinking about this comment, I would like to ask: can't the virtio device indicate the ranges in a portable way? This would minimize the dependency on dt bindings and ACPI, enabling support for systems that have neither but do have virtio e.g. through pci.I thought about adding a PROBE request for this in virtio-iommu, but it wouldn't be usable by a Linux guest because of a bootstrapping problem.Hmm. At some level it seems wrong to design hardware interfaces around how Linux happens to probe things. That can change at any time ...
This isn't Linux-specific though. In general it's somewhere between difficult and impossible to pull in an IOMMU underneath a device after at device is active, so if any OS wants to use an IOMMU, it's going to want to know up-front that it's there and which devices it translates so that it can program said IOMMU appropriately *before* potentially starting DMA and/or interrupts from the relevant devices. Linux happens to do things in that order (either by firmware-driven probe-deferral or just perilous initcall ordering) because it is the only reasonable order in which to do them. AFAIK the platforms which don't rely on any firmware description of their IOMMU tend to have a fairly static system architecture (such that the OS simply makes hard-coded assumptions), so it's not necessarily entirely clear how they would cope with virtio-iommu either way. Robin.
quoted
Early on, Linux needs a description of device dependencies, to determine in which order to probe them. If the device dependency was described by virtio-iommu itself, the guest could for example initialize a NIC, allocate buffers and start DMA on the physical address space (which aborts if the IOMMU implementation disallows DMA by default), only to find out once the virtio-iommu module is loaded that it needs to cancel all DMA and reconfigure the NIC. With a static description such as iommu-map in DT or ACPI remapping tables, the guest can defer probing of the NIC until the IOMMU is initialized. Thanks, JeanCould you point me at the code you refer to here?