Re: [PATCH 2/2] arm: use swiotlb for bounce buffer on LPAE configs | linux-arm-kernel

quoted

Christoph, Robin,

On 09/01/2020 16.49, Christoph Hellwig wrote:
On Wed, Jan 08, 2020 at 03:20:07PM +0000, Robin Murphy wrote:
The problem - I think - is that the DMA_BIT_MASK(32) from
dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32)) is treated as physical
address along the call path so the dma_pfn_offset is applied to it and
the check will fail, saying that DMA_BIT_MASK(32) can not be supported.
But that's the thing - in isolation, that is entirely correct. Considering 
ZONE_DMA32 for simplicity, in general the zone is expected to cover the 
physical address range 0x0000_0000 - 0xffff_ffff (because DMA offsets are 
relatively rare), and a device with a dma_pfn_offset of more than 
(0x1_0000_0000 >> PAGE_SHIFT) *cannot* support that range with any mask, 
because the DMA address itself would have to be negative.
Note that ZONE_DMA32 is irrelevant in this particular case, as we are
talking about arm32.  But with ZONE_DMA instead this roughly makes sense.

The problem is that platforms with esoteric memory maps have no right thing 
to do. If the base of RAM is at at 0x1_0000_0000 or higher, the "correct" 
ZONE_DMA32 would be empty while ZONE_NORMAL above it would not, and last 
time I looked that makes the page allocator break badly. So the standard 
bodge on such platforms is to make ZONE_DMA32 cover not the first 4GB of 
*PA space*, but the first 4GB of *RAM*, wherever that happens to be. That 
then brings different problems - now the page allocator is happy and 
successfully returns GFP_DMA32 allocations from the range 0x8_0000_0000 - 
0x8_ffff_ffff that are utterly useless to 32-bit devices with zero 
dma_pfn_offset - see the AMD Seattle SoC for the prime example of that. If 
on the other hand all devices are guaranteed to have a dma_pfn_offset that 
puts the base of RAM at DMA address 0 then GFP_DMA32 allocations do end up 
working as expected, but now the original assumption of where ZONE_DMA32 
actually is is broken, so generic code unaware of the 
platform/architecture-specific bodge will be misled - that's the case 
you're running into.

Having thought this far, if there's a non-hacky way to reach in and grab 
ZONE_DMA{32} such that dma_direct_supported() could use zone_end_pfn() 
instead of trying to assume either way, that might be the most robust 
general solution.
zone_dma_bits is our somewhat ugly way to try to poke into this
information, although the way it is done right now sucks pretty badly.
In my view the handling of dma_pfn_offset is just incorrect as it is applied to _any_ address.
According to DT specification dma-ranges:
"Value type: <empty> or <prop-encoded-array> encoded as an arbitrary
number of (child-bus-address, parent-bus-address, length) triplets."

Yet in drivers/of/ we only take the _first_ triplet and ignore the rest.

The dma_pfn_offset should be only applied to paddr in the range:
parent-bus-address to parent-bus-address+length
for anything outside of this the dma_pfn_offset is 0.

conversion back from dma to paddr should consider the offset in range:
child-bus-address to child-bus-address+length
and 0 for everything outside of this.

To correctly handle the dma-ranges we would need something like this in device.h:
+struct dma_ranges {
+       u64 paddr;
+       u64 dma_addr;
+       u64 size;
+	unsigned long pfn_offset;
+};
+

struct device {
	...
-	unsigned long	dma_pfn_offset;
+       struct dma_ranges *dma_ranges;
	int dma_ranges_cnt;
	...
};

Then when we currently use dma_pfn_offset we would have:

unsigned long __phys_to_dma_pfn_offset(struct device *dev, phys_addr_t paddr)
{
	int i;

	if (!dev->dma_ranges)
		return 0;

	for (i = 0; i < dev->dma_ranges_cnt; i++) {
		struct dma_ranges *range = &dev->dma_ranges[i];
		if (paddr >= range->paddr &&
		    paddr <= (range->paddr + range->size))
			return range->pfn_offset;
	}

	return 0;
}

unsigned long __dma_to_phys_pfn_offset(struct device *dev, dma_addr_t dma_addr)
{
	int i;

	for (i = 0; i < dev->dma_ranges_cnt; i++) {
		struct dma_ranges *range = &dev->dma_ranges[i];
		if (dma_addr >= range->dma_addr &&
		    dma_addr <= (range->dma_addr + range->size))
			return range->pfn_offset;
	}

	return 0;
}

For existing drivers/archs setting dma_pfn_offset we can:
if (dev->dma_ranges_cnt == 1 && dev->dma_ranges[0].pfn_offset && !dev->dma_ranges[0].size)
	return dev->dma_ranges[0].pfn_offset;

and they would have to set up one struct dma_ranges.

One of the issue with this is that the struct dma_ranges would need to be allocated for
all devices, so there should be a some clever way need to be invented to use pointers
as much as we can.

The patch I sent to Peter in December was trying to convey that
information in a way similar to what the arm32 legacy dma code does, but
it didn't work, so I'll need to find some time to sit down and figure out
why.
But, while we get a proper solution can we get the following patch in to fix the regression?
Basically we are falling back to what works (and was used before commit ad3c7b18c5b362be5dbd0f2c0bcf1fd5fd659315).

commit 8c3c36b377c139603a9dff5c58dac59865f1ac0f
Author: Peter Ujfalusi [off-list ref]
Date:   Thu Dec 19 15:07:25 2019 +0200

    arm: mm: dma-mapping: Fix dma_supported() when dev->dma_pfn_offset is not 0
    
    We can only use direct mapping when LPAE is enabled if the dma_pfn_offset
    is 0, otherwise valid dma_masks will be rejected and the DMA support is
    going to be denied for peripherals, or DMA drivers.
    
    Cc: Stable [off-list ref] #v5.3+
    Signed-off-by: Peter Ujfalusi [off-list ref]

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 9414d72f664b..e07ec1ea3865 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1100,15 +1100,6 @@ int arm_dma_supported(struct device *dev, u64 mask)
 
 static const struct dma_map_ops *arm_get_dma_map_ops(bool coherent)
 {
-	/*
-	 * When CONFIG_ARM_LPAE is set, physical address can extend above
-	 * 32-bits, which then can't be addressed by devices that only support
-	 * 32-bit DMA.
-	 * Use the generic dma-direct / swiotlb ops code in that case, as that
-	 * handles bounce buffering for us.
-	 */
-	if (IS_ENABLED(CONFIG_ARM_LPAE))
-		return NULL;
 	return coherent ? &arm_coherent_dma_ops : &arm_dma_ops;
 }
 
@@ -2313,6 +2304,15 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 
 	if (arm_setup_iommu_dma_ops(dev, dma_base, size, iommu))
 		dma_ops = arm_get_iommu_dma_map_ops(coherent);
+	else if (IS_ENABLED(CONFIG_ARM_LPAE) && !dev->dma_pfn_offset)
+		/*
+		 * When CONFIG_ARM_LPAE is set, physical address can extend
+		 * above * 32-bits, which then can't be addressed by devices
+		 * that only support 32-bit DMA.
+		 * Use the generic dma-direct / swiotlb ops code in that case,
+		 * as that handles bounce buffering for us.
+		 */
+		dma_ops = NULL;
 	else
 		dma_ops = arm_get_dma_map_ops(coherent);

--- 
- Péter

Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki. Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help