Re: [PATCH 5/5] powerpc: use the generic dma_ops_bypass mode
From: Cédric Le Goater <clg@kaod.org>
Date: 2020-08-31 07:28:00
Also in:
linux-iommu, lkml
On 8/31/20 8:40 AM, Christoph Hellwig wrote:
On Sun, Aug 30, 2020 at 11:04:21AM +0200, Cédric Le Goater wrote:quoted
Hello, On 7/8/20 5:24 PM, Christoph Hellwig wrote:quoted
Use the DMA API bypass mechanism for direct window mappings. This uses common code and speed up the direct mapping case by avoiding indirect calls just when not using dma ops at all. It also fixes a problem where the sync_* methods were using the bypass check for DMA allocations, but those are part of the streaming ops. Note that this patch loses the DMA_ATTR_WEAK_ORDERING override, which has never been well defined, as is only used by a few drivers, which IIRC never showed up in the typical Cell blade setups that are affected by the ordering workaround. Fixes: efd176a04bef ("powerpc/pseries/dma: Allow SWIOTLB") Signed-off-by: Christoph Hellwig <hch@lst.de> --- arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/device.h | 5 -- arch/powerpc/kernel/dma-iommu.c | 90 ++++--------------------------- 3 files changed, 10 insertions(+), 86 deletions(-)I am seeing corruptions on a couple of POWER9 systems (boston) when stressed with IO. stress-ng gives some results but I have first seen it when compiling the kernel in a guest and this is still the best way to raise the issue. These systems have of a SAS Adaptec controller : 0003:01:00.0 Serial Attached SCSI controller: Adaptec Series 8 12G SAS/PCIe 3 (rev 01) When the failure occurs, the POWERPC EEH interrupt fires and dumps lowlevel PHB4 registers among which : [ 2179.251069490,3] PHB#0003[0:3]: phbErrorStatus = 0000028000000000 [ 2179.251117476,3] PHB#0003[0:3]: phbFirstErrorStatus = 0000020000000000 The bits raised identify a PPC 'TCE' error, which means it is related to DMAs. See below for more details. Reverting this patch "fixes" the issue but it is probably else where, in some other layers or in the aacraid driver. How should I proceed to get more information ?The aacraid DMA masks look like a mess. Can you try the hack below and see it it helps?
No effect. The system crashes the same. But Alexey spotted some issue with swiotlb. C.