Thread (90 messages) 90 messages, 17 authors, 2011-05-03

[Linaro-mm-sig] [RFC] ARM DMA mapping TODO, v1

From: laurent.pinchart@ideasonboard.com (Laurent Pinchart)
Date: 2011-05-03 15:05:12
Also in: lkml

On Wednesday 27 April 2011 12:43:16 Arnd Bergmann wrote:
On Wednesday 27 April 2011, Catalin Marinas wrote:
quoted
On 21 April 2011 20:29, Arnd Bergmann [off-list ref] wrote:
quoted
I think the recent discussions on linaro-mm-sig and the BoF last week
at ELC have been quite productive, and at least my understanding
of the missing pieces has improved quite a bit. This is a list of
things that I think need to be done in the kernel. Please complain
if any of these still seem controversial:

1. Fix the arm version of dma_alloc_coherent. It's in use today and

  is broken on modern CPUs because it results in both cached and
  uncached mappings. Rebecca suggested different approaches how to
  get there.
It's not broken since we moved to using Normal non-cacheable memory
for the coherent DMA buffers (as long as you flush the cacheable alias
before using the buffer, as we already do). The ARM ARM currently says
unpredictable for such situations but this is being clarified in
future updates and the Normal non-cacheable vs cacheable aliases can
be used (given correct cache maintenance before using the buffer).
Thanks for that information, I believe a number of people in the
previous discussions were relying on the information from the
documentation. Are you sure that this is not only correct for the
cores made by ARM ltd but also for the other implementations that
may have relied on documentation?

As I mentioned before, there are other architectures, where having
conflicting cache settings in TLB entries for the same pysical page
immediately checkstops the CPU, and I guess that this was also allowed
by the current version of the ARM ARM.
quoted
quoted
2. Implement dma_alloc_noncoherent on ARM. Marek pointed out

  that this is needed, and it currently is not implemented, with
  an outdated comment explaining why it used to not be possible
  to do it.
As Russell pointed out, there are 4 main combinations with iommu and
some coherency support (i.e. being able to snoop the CPU caches). But
in an SoC you can have different devices with different iommu and
coherency configurations. Some of them may even be able to see the L2
cache but not the L1 (in which case it would help if we can get an
inner non-cacheable outer cacheable mapping).

Anyway, we end up with different DMA ops per device via dev_archdata.
Having different DMA ops per device was the solution that I was suggesting
with dma_mapping_common.h, but Russell pointed out that it may not be
the best option.

The alternative would be to have just one set of dma_mapping functions
as we do today, but to extend the functions to also cover the iommu
case, for instance (example, don't take literally):

static inline dma_addr_t dma_map_single(struct device *dev, void *cpu_addr,
                size_t size, enum dma_data_direction dir)
{
	dma_addr_t ret;

#ifdef CONFIG_DMABOUNCE
	if (dev->archdata.dmabounce)
		return dmabounce_map_single(dev, cpu_addr, size, dir);
#endif

#ifdef CONFIG_IOMMU
	if (dev->archdata.iommu)
		ret = iommu_map_single(dev, cpu_addr, size, dir);
	else
#endif
I wish it was that simple.

The OMAP4 ISS (Imaging Subsystem) has no IOMMU, but it can use the OMAP4 DMM 
(Dynamic Memory Manager) which acts as a memory remapper. Basically (if my 
understanding is correct), the ISS is configured to read/write from/to 
physical addresses. If those physical addresses are in the DMM address range, 
the DMM translates the accesses to physical accesses, acting as an IOMMU.

The ISS can thus write to physically contiguous memory directly, or to 
scattered physical pages through the DMM. Whether an IOMMU (or, to be correct 
in this case, the IOMMU-like DMM) needs to handle the DMA is a per-buffer 
decision, not a per-device decision.
		dma_addr = virt_to_dma(dev, ptr);

	dma_sync_single_for_device(dev, dma_addr, size, dir);
}

This would not even conflict with having a common implementation
for iommu based dma_map_ops -- we would just call the iommu functions
directly when needed rather than having an indirect function call.
-- 
Regards,

Laurent Pinchart
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help