[PATCH v5 1/3] iommu: Implement common IOMMU ops for DMA mapping

From: robin.murphy@arm.com (Robin Murphy)
Date: 2015-08-11 13:31:34
Also in: linux-iommu

Hi Joerg,

On 11/08/15 10:37, Joerg Roedel wrote:

On Fri, Aug 07, 2015 at 02:38:39PM +0100, Robin Murphy wrote:

quoted

Indeed, DMA_DEBUG will check that a driver is making DMA API calls
to the arch code in the right way; this is a different check, to
catch things like the arch code passing the wrong domain into this
layer, or someone else having messed directly with the domain via
the IOMMU API. If the iommu_unmap doesn't match the IOVA region we
looked up, that means the IOMMU page tables have somehow become
inconsistent with the IOVA allocator, so we are in an unrecoverable
situation where we can no longer be sure what devices have access
to. That's bad.

Sure, but the BUG_ON would also trigger on things like a double-free,
which is bad to handle as a BUG_ON. A WARN_ON for this is sufficient.

Oh dear, it gets even better than that; in the case of a simple 
double-unmap where the IOVA is already free, we wouldn't even get as far 
as that check because we'd die calling iova_size(NULL). How on Earth did 
I get to v5 without spotting that? :(

Anyway, on reflection I think you're probably right; I've clearly been 
working on this for long enough to start falling into the "my thing is 
obviously more important than all the other things" trap.

quoted

AFAIK, yes (this is just a slight tidyup of the existing code that
32-bit Exynos/Tegra/Rockchip/etc. devices are already using) - the
display guys want increasingly massive contiguous allocations for
framebuffers, layers, etc., so having IOMMU magic deal with that
saves CMA for non-IOMMU devices that really need it.

Makes sense, I thougt about something similar for x86 too to avoid the
high-order allocations we currently do. I guess the buffer will later be
mapped into the vmalloc space for the CPU?

Indeed - for non-coherent devices we have to remap all allocations 
(IOMMU or not) anyway in order to get a non-cacheable CPU mapping of the 
buffer, so having non-contiguous pages is no bother; for coherent 
devices we can just do the same thing but keep the vmalloc mapping 
cacheable. There's also the DMA_ATTR_NO_KERNEL_MAPPING case (e.g. GPU 
just wants a big buffer to render into and read back out again) where we 
wouldn't need a CPU address at all, although on arm64 vmalloc space is 
cheap enough that we've no plans to implement that at the moment.

Robin.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help