Thread (113 messages) 113 messages, 13 authors, 2018-09-13

[PATCH v2 01/40] iommu: Introduce Shared Virtual Addressing API

From: Jean-Philippe Brucker <hidden>
Date: 2018-09-12 12:41:04
Also in: kvm, linux-acpi, linux-devicetree, linux-iommu, linux-mm, linux-pci

On 08/09/2018 08:29, Christian K?nig wrote:
Yes, exactly. I just need a PASID which is never used by the OS for a 
process and we can easily give that back when the last FD reference is 
closed.
Alright, iommu-sva can get its PASID from this external allocator as
well, as long as it has an interface similar to idr. Where would it go,
drivers/base/, mm/, kernel/...?
quoted
quoted
quoted
The process dies, iommu-sva is notified and calls the mm_exit()
function passed by the device driver to iommu_sva_device_init(). In
mm_exit() the device driver needs to clear any reference to the
PASID in hardware and in its own structures. When the device driver
returns from mm_exit(), it effectively tells the core that it has
finished using the PASID, and iommu-sva can reuse the PASID for
another process. mm_exit() is allowed to block, so the device
driver has time to clean up and flush the queues.

If the device driver finishes using the PASID before the process
exits, it just calls unbind().
Exactly that's what Michal Hocko is probably going to not like at all.

Can we have a different approach where each driver is informed by the
mm_exit(), but needs to explicitly call unbind() before a PASID is
reused?
It's awful from the IOMMU driver perspective. In addition to "enabled"
and "disabled" PASID states, you add "disabled but DMA still running
normally". Between that new state and "disabled", the IOMMU will be
flooded by translation faults (non-recoverable ones), which it needs to
ignore instead of reporting to the kernel. Not all IOMMUs can deal with
this in hardware (SMMU and VT-d can quiesce translation faults
per-PASID, but I don't think AMD IOMMU can.) Some drivers will have to
filter fault events themselves, depending on the PASID state.
quoted
quoted
During that teardown transition it would be ideal if that PASID only
points to a dummy root page directory with only invalid entries.
I guess this can be vendor specific, In VT-d I plan to mark PASID
entry not present and disable fault reporting while draining remaining
activities.
Sounds good to me.

Point is at least in the case where the process was killed by the OOM 
killer we should not block in mm_exit().

Instead operations issued by the process to a device driver which uses 
SVA needs to be terminated as soon as possible to make sure that the OOM 
killer can advance.
I don't see how we're preventing the OOM killer from advancing, so I'm
looking for a stronger argument that justifies adding this complexity to
IOMMU drivers. Time limit of the release MMU notifier, locking
requirement, a concrete example where things break, even a comment
somewhere in mm/ would do...

In my tests I can't manage to disturb the OOM killer, but I could be
missing some special case. Even if the mm_exit() callback
(unrealistically) sleeps for 60 seconds, the OOM killer isn't affected
because oom_reap_task_mm() wipes the victim's address space in another
thread, either before or while the release notifier is running.

Thanks,
Jean
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help