Thread (1 message) 1 message, 1 author, 2017-01-30

[PATCH V7 08/11] drivers: acpi: Handle IOMMU lookup failure with deferred probing or error

From: Sinan Kaya <hidden>
Date: 2017-01-30 15:46:39
Also in: linux-acpi, linux-arm-msm, linux-iommu, linux-pci

On 1/30/2017 9:54 AM, Nate Watterson wrote:
On 2017-01-30 09:38, Will Deacon wrote:
quoted
On Mon, Jan 30, 2017 at 09:33:50AM -0500, Sinan Kaya wrote:
quoted
On 1/30/2017 9:23 AM, Nate Watterson wrote:
quoted
On 2017-01-30 08:59, Sinan Kaya wrote:
quoted
On 1/30/2017 7:22 AM, Robin Murphy wrote:
quoted
On 29/01/17 17:53, Sinan Kaya wrote:
quoted
On 1/24/2017 7:37 AM, Lorenzo Pieralisi wrote:
quoted
[+hanjun, tomasz, sinan]

It is quite a key patchset, I would be glad if they can test on their
respective platforms with IORT.
Tested on top of 4.10-rc5.

1.    Platform Hidma device passed dmatest
2.    Seeing some USB stalls on a platform USB device.
3.    PCIe NVME drive probed and worked fine with MSI interrupts after boot.
4.     NVMe driver didn't probe following a hotplug insertion and received an
SMMU error event during the insertion.
What was the SMMU error - a translation/permission fault (implying the
wrong DMA ops) or a bad STE fault (implying we totally failed to tell
the SMMU about the device at all)?
root at ubuntu:/sys/bus/pci/slots/4# echo 0 > power

[__204.698522]_iommu:_Removing_device_0003:01:00.0_from_group_0
[  204.708704] pciehp 0003:00:00.0:pcie004: Slot(4): Link Down
[  204.708723] pciehp 0003:00:00.0:pcie004: Slot(4): Link Down event
ignored; already powering off

root at ubuntu:/sys/bus/pci/slots/4#

[__254.820440]_iommu:_Adding_device_0003:01:00.0_to_group_8
[  254.820599] nvme nvme0: pci function 0003:01:00.0
[  254.820621] nvme 0003:01:00.0: enabling device (0000 -> 0002)
[  261.948558] arm-smmu-v3 arm-smmu-v3.0.auto: event 0x0a received:
[  261.948561] arm-smmu-v3 arm-smmu-v3.0.auto:  0x000001000000000a
[  261.948563] arm-smmu-v3 arm-smmu-v3.0.auto:  0x0000000000000000
[  261.948564] arm-smmu-v3 arm-smmu-v3.0.auto:  0x0000000000000000
[  261.948566] arm-smmu-v3 arm-smmu-v3.0.auto:  0x0000000000000000
quoted
Looks like C_BAD_CD. Can you please try with:
iommu/arm-smmu-v3: Clear prior settings when updating STEs
This resolved the issue. Can we pull Nate's patch to 4.10 so that I don't see
this issue again.
I already sent the pull request to Joerg for 4.11. Do you see this problem
without Sricharan's patches (i.e. vanilla mainline)? If so, we'll need to
send the patch to stable after -rc1.
Using vanilla mainline, I see it most commonly when directly assigning
a device to a guest machine. I think I've also seen it after removing then
re-adding a PCI device. Basically anytime an STE's CTX pointer is changed
from a non-NULL value and STE[CFG] indicates translation will be performed.
I was not able to reproduce the issue with Vanilla kernel. I only tested hotplug.
Nate
quoted
Will

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help