[PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API
From: christian.koenig@amd.com (Christian König)
Date: 2018-09-08 07:29:36
Also in:
kvm, linux-acpi, linux-devicetree, linux-iommu, linux-mm, linux-pci
Am 07.09.2018 um 23:25 schrieb Jacob Pan:
On Fri, 7 Sep 2018 20:02:54 +0200 Christian K?nig [off-list ref] wrote:quoted
[SNIP]quoted
iommu-sva expects everywhere that the device has an iommu_domain, it's the first thing we check on entry. Bypassing all of this would call idr_alloc() directly, and wouldn't have any code in common with the current iommu-sva. So it seems like you need a layer on top of iommu-sva calling idr_alloc() when an IOMMU isn't present, but I don't think it should be in drivers/iommu/In this case I question if the PASID handling should be under drivers/iommu at all. See I can have a mix of VM context which are bound to processes (some few) and VM contexts which are standalone and doesn't care for a process address space. But for each VM context I need a distinct PASID for the hardware to work. I can live if we say if IOMMU is completely disabled we use a simple ida to allocate them, but when IOMMU is enabled I certainly need a way to reserve a PASID without an associated process.VT-d would also have such requirement. There is a virtual command register for allocate and free PASID for VM use. When that PASID allocation request gets propagated to the host IOMMU driver, we need to allocate PASID w/o mm. If the PASID allocation is done via VFIO, can we have FD to track PASID life cycle instead of mm_exit()? i.e. all FDs get closed before mm_exit, I assume?
Yes, exactly. I just need a PASID which is never used by the OS for a process and we can easily give that back when the last FD reference is closed.
quoted
quoted
quoted
3. Even after destruction of a process address space we need some grace period before a PASID is reused because it can be that the specific PASID is still in some hardware queues etc... ??? ??? At bare minimum all device drivers using process binding need to explicitly note to the core when they are done with a PASID.Right, much of the horribleness in iommu-sva deals with this: The process dies, iommu-sva is notified and calls the mm_exit() function passed by the device driver to iommu_sva_device_init(). In mm_exit() the device driver needs to clear any reference to the PASID in hardware and in its own structures. When the device driver returns from mm_exit(), it effectively tells the core that it has finished using the PASID, and iommu-sva can reuse the PASID for another process. mm_exit() is allowed to block, so the device driver has time to clean up and flush the queues. If the device driver finishes using the PASID before the process exits, it just calls unbind().Exactly that's what Michal Hocko is probably going to not like at all. Can we have a different approach where each driver is informed by the mm_exit(), but needs to explicitly call unbind() before a PASID is reused? During that teardown transition it would be ideal if that PASID only points to a dummy root page directory with only invalid entries.I guess this can be vendor specific, In VT-d I plan to mark PASID entry not present and disable fault reporting while draining remaining activities.
Sounds good to me. Point is at least in the case where the process was killed by the OOM killer we should not block in mm_exit(). Instead operations issued by the process to a device driver which uses SVA needs to be terminated as soon as possible to make sure that the OOM killer can advance. Thanks, Christian.