Re: [PATCH v2] powerpc/iommu: DMA address offset is incorrectly calculated with 2MB TCEs
From: Alexey Kardashevskiy <hidden>
Date: 2023-05-22 00:09:55
Hi Gaurav, Sorry I missed this. Please share the link to the your fix, I do not see it in my mail. In general, the problem can probably be solved by using huge pages (anything more than 64K) only for 1:1 mapping. On 03/05/2023 13:25, Gaurav Batra wrote:
Hello Alexey, I recently joined IOMMU team. There was a bug reported by test team where Mellanox driver was timing out during configuration. I proposed a fix for the same, which is below in the email. You suggested a fix for Srikar's reported problem. Basically, both these fixes will resolve Srikar and Mellanox driver issues. The problem is with 2MB DDW. Since you have extensive knowledge of IOMMU design and code, in your opinion, which patch should we adopt? Thanks a lot Gaurav On 4/20/23 2:45 PM, Gaurav Batra wrote:quoted
Hello Michael, I was looking into the Bug: 199106 (https://bugzilla.linux.ibm.com/show_bug.cgi?id=199106). In the Bug, Mellanox driver was timing out when enabling SRIOV device. I tested, Alexey's patch and it fixes the issue with Mellanox driver. The down side to Alexey's fix is that even a small memory request by the driver will be aligned up to 2MB. In my test, the Mellanox driver is issuing multiple requests of 64K size. All these will get aligned up to 2MB, which is quite a waste of resources. In any case, both the patches work. Let me know which approach you prefer. In case we decide to go with my patch, I just realized that I need to fix nio_pages in iommu_free_coherent() as well. Thanks, Gaurav On 4/20/23 10:21 AM, Michael Ellerman wrote:quoted
Gaurav Batra [off-list ref] writes:quoted
When DMA window is backed by 2MB TCEs, the DMA address for the mapped page should be the offset of the page relative to the 2MB TCE. The code was incorrectly setting the DMA address to the beginning of the TCE range. Mellanox driver is reporting timeout trying to ENABLE_HCA for an SR-IOV ethernet port, when DMA window is backed by 2MB TCEs.I assume this is similar or related to the bug Srikar reported? https://lore.kernel.org/linuxppc-dev/20230323095333.GI1005120@linux.vnet.ibm.com/ (local) In that thread Alexey suggested a patch, have you tried his patch? He suggested rounding up the allocation size, rather than adjusting the dma_handle.quoted
Fixes: 3872731187141d5d0a5c4fb30007b8b9ec36a44dThat's not the right syntax, it's described in the documentation how to generate it. It should be: Fixes: 387273118714 ("powerps/pseries/dma: Add support for 2M IOMMU page size") cheersquoted
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c index ee95937bdaf1..ca57526ce47a 100644 --- a/arch/powerpc/kernel/iommu.c +++ b/arch/powerpc/kernel/iommu.c@@ -517,7 +517,7 @@ int ppc_iommu_map_sg(struct device *dev, structiommu_table *tbl, /* Convert entry to a dma_addr_t */ entry += tbl->it_offset; dma_addr = entry << tbl->it_page_shift; - dma_addr |= (s->offset & ~IOMMU_PAGE_MASK(tbl)); + dma_addr |= (vaddr & ~IOMMU_PAGE_MASK(tbl)); DBG(" - %lu pages, entry: %lx, dma_addr: %lx\n", npages, entry, dma_addr);@@ -904,6 +904,7 @@ void *iommu_alloc_coherent(struct device *dev,struct iommu_table *tbl, unsigned int order; unsigned int nio_pages, io_order; struct page *page; + int tcesize = (1 << tbl->it_page_shift); size = PAGE_ALIGN(size); order = get_order(size);@@ -930,7 +931,8 @@ void *iommu_alloc_coherent(struct device *dev,struct iommu_table *tbl, memset(ret, 0, size); /* Set up tces to cover the allocated range */ - nio_pages = size >> tbl->it_page_shift; + nio_pages = IOMMU_PAGE_ALIGN(size, tbl) >> tbl->it_page_shift; + io_order = get_iommu_order(size, tbl); mapping = iommu_alloc(dev, tbl, ret, nio_pages, DMA_BIDIRECTIONAL, mask >> tbl->it_page_shift, io_order, 0);@@ -938,7 +940,8 @@ void *iommu_alloc_coherent(struct device *dev,struct iommu_table *tbl, free_pages((unsigned long)ret, order); return NULL; } - *dma_handle = mapping; + + *dma_handle = mapping | ((u64)ret & (tcesize - 1)); return ret; } --
-- Alexey