Re: CAAM: kernel BUG at drivers/crypto/caam/jr.c:230! (and dma-coherent query)
From: Horia Geantă <horia.geanta@nxp.com>
Date: 2021-03-04 00:17:05
Also in:
linux-crypto, linux-iommu
Subsystem:
arm/freescale layerscape arm architecture, the rest · Maintainers:
Frank Li, Linus Torvalds
Adding some people in the loop, maybe they could help in understanding why lack of "dma-coherent" property for a HW-coherent device could lead to unexpected / strange side effects. On 3/1/2021 5:22 PM, Sascha Hauer wrote:
Hi All, I am on a Layerscape LS1046a using Linux-5.11. The CAAM driver sometimes crashes during the run-time self tests with:quoted
kernel BUG at drivers/crypto/caam/jr.c:247! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.11.0-20210225-3-00039-g434215968816-dirty #12 Hardware name: TQ TQMLS1046A SoM on Arkona AT1130 (C300) board (DT) pstate: 60000005 (nZCv daif -PAN -UAO -TCO BTYPE=--) pc : caam_jr_dequeue+0x98/0x57c lr : caam_jr_dequeue+0x98/0x57c sp : ffff800010003d50 x29: ffff800010003d50 x28: ffff8000118d4000 x27: ffff8000118d4328 x26: 00000000000001f0 x25: ffff0008022be480 x24: ffff0008022c6410 x23: 00000000000001f1 x22: ffff8000118d4329 x21: 0000000000004d80 x20: 00000000000001f1 x19: 0000000000000001 x18: 0000000000000020 x17: 0000000000000000 x16: 0000000000000015 x15: ffff800011690230 x14: 2e2e2e2e2e2e2e2e x13: 2e2e2e2e2e2e2020 x12: 3030303030303030 x11: ffff800011700a38 x10: 00000000fffff000 x9 : ffff8000100ada30 x8 : ffff8000116a8a38 x7 : 0000000000000001 x6 : 0000000000000000 x5 : 0000000000000000 x4 : 0000000000000000 x3 : 00000000ffffffff x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000001800 Call trace: caam_jr_dequeue+0x98/0x57c tasklet_action_common.constprop.0+0x164/0x18c tasklet_action+0x44/0x54 __do_softirq+0x160/0x454 __irq_exit_rcu+0x164/0x16c irq_exit+0x1c/0x30 __handle_domain_irq+0xc0/0x13c gic_handle_irq+0x5c/0xf0 el1_irq+0xb4/0x180 arch_cpu_idle+0x18/0x30 default_idle_call+0x3c/0x1c0 do_idle+0x23c/0x274 cpu_startup_entry+0x34/0x70 rest_init+0xdc/0xec arch_call_rest_init+0x1c/0x28 start_kernel+0x4ac/0x4e4 Code: 91392021 912c2000 d377d8c6 97f24d96 (d4210000)The driver iterates over the descriptors in the output ring and matches them with the ones it has previously queued. If it doesn't find a matching descriptor it complains with the BUG_ON() seen above. What I see sometimes is that the address in the output ring is 0x0, the job status in this case is 0x40000006 (meaning DECO Invalid KEY command). It seems that the CAAM doesn't write the descriptor address to the output ring at least in some error cases. When we don't have the descriptor address of the failed descriptor we have no way to find it in the list of queued descriptors, thus we also can't find the callback for that descriptor. This looks very unfortunate, anyone else seen this or has an idea what to do about it? I haven't investigated yet which job actually fails and why. Of course that would be my ultimate goal to find that out.
This looks very similar to an earlier report from Greg. He confirmed that adding "dma-coherent" property to the "crypto" DT node fixes the issue: https://lore.kernel.org/linux-crypto/74f664f5-5433-d322-4789-3c78bdb814d8@kernel.org (local) Patch rebased on v5.11 is at the bottom. Does it work for you too? What I don't understand (and the reason I've postponed upstreaming it) is _why_ exactly this patch is working. I would have expected that a HW-coherent device to work fine even without the "dma-coherent" DT property in the corresponding node. I've found what seems related discussions involving eSDHC, but still I am trying to figure out what's happening. I'd really appreciate a clarification on what could go wrong (e.g. interactions with SW-based cache management etc.): https://lore.kernel.org/linux-mmc/20190916171509.GG25745@shell.armlinux.org.uk (local) https://lore.kernel.org/lkml/20191010083503.250941866@linuxfoundation.org (local) https://lore.kernel.org/linux-mmc/AM7PR04MB688507B5B4D84EB266738891F8320@AM7PR04MB6885.eurprd04.prod.outlook.com (local) Thanks, Horia -- >8 -- Subject: [PATCH] arm64: dts: ls1046a: mark crypto engine dma coherent Crypto engine (CAAM) on LS1046A platform has support for HW coherency, mark accordingly the DT node. Signed-off-by: Horia Geantă <horia.geanta@nxp.com> --- arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
index 025e1f587662..6d4db3e021e8 100644
--- a/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-ls1046a.dtsi@@ -325,6 +325,7 @@ ranges = <0x0 0x00 0x1700000 0x100000>; reg = <0x00 0x1700000 0x0 0x100000>; interrupts = <GIC_SPI 75 IRQ_TYPE_LEVEL_HIGH>; + dma-coherent; sec_jr0: jr@10000 { compatible = "fsl,sec-v5.4-job-ring",
_______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel