[PATCH v5 1/2] PCI: mediatek: Clear IRQ status after IRQ dispatched to avoid reentry

From: Lorenzo Pieralisi <hidden>
Date: 2018-01-04 18:39:47
Also in: linux-devicetree, linux-mediatek, linux-pci, lkml

[+Marc]

On Wed, Dec 27, 2017 at 08:59:53AM +0800, honghui.zhang at mediatek.com wrote:

From: Honghui Zhang <redacted>

There maybe a same IRQ reentry scenario after IRQ received in current
IRQ handle flow:
	EP device		PCIe host driver	EP driver
1. issue an IRQ
			2. received IRQ
			3. clear IRQ status
			4. dispatch IRQ
						5. clear IRQ source
The IRQ status was not successfully cleared at step 2 since the IRQ
source was not cleared yet. So the PCIe host driver may receive the
same IRQ after step 5. Then there's an IRQ reentry occurred.
Even worse, if the reentry IRQ was not an IRQ that EP driver expected,
it may not handle the IRQ. Then we may run into the infinite loop from
step 2 to step 4.
Clear the IRQ status after IRQ have been dispatched to avoid the IRQ
reentry.
This patch also fix another INTx IRQ issue by initialize the iterate
before the loop. If an INTx IRQ re-occurred while we are dispatching
the INTx IRQ, then iterate may start from PCI_NUM_INTX + INTX_SHIFT
instead of INTX_SHIFT for the second time entering the
for_each_set_bit_from() loop.

This looks like two different issues that should be fixed with two
patches.

Signed-off-by: Honghui Zhang <redacted>
Acked-by: Ryder Lee <ryder.lee@mediatek.com>
---
 drivers/pci/host/pcie-mediatek.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

For the sake of uniformity, I first want to understand why this
driver does not call:

chained_irq_enter/exit()

in the primary handler (mtk_pcie_intr_handler()).

With the GIC as a primary interrupt controller we have not
even figured out how current code can actually work without
calling the chained_* API.

I want to come up with a consistent handling of IRQ domains for
all host bridges and any discrepancy should be explained.

quoted hunk ↗ jump to hunk

diff --git a/drivers/pci/host/pcie-mediatek.c b/drivers/pci/host/pcie-mediatek.c
index db93efd..fc29a9a 100644
--- a/drivers/pci/host/pcie-mediatek.c
+++ b/drivers/pci/host/pcie-mediatek.c

@@ -601,15 +601,16 @@ static irqreturn_t mtk_pcie_intr_handler(int irq, void *data)
 	struct mtk_pcie_port *port = (struct mtk_pcie_port *)data;
 	unsigned long status;
 	u32 virq;
-	u32 bit = INTX_SHIFT;
+	u32 bit;
 
 	while ((status = readl(port->base + PCIE_INT_STATUS)) & INTX_MASK) {
+		bit = INTX_SHIFT;
 		for_each_set_bit_from(bit, &status, PCI_NUM_INTX + INTX_SHIFT) {
-			/* Clear the INTx */
-			writel(1 << bit, port->base + PCIE_INT_STATUS);
 			virq = irq_find_mapping(port->irq_domain,
 						bit - INTX_SHIFT);
 			generic_handle_irq(virq);
+			/* Clear the INTx */
+			writel(1 << bit, port->base + PCIE_INT_STATUS);

I think that these masking/acking should actually be done through
the irq_chip hooks (see for instance pci-ftpci100.c) - that would
make this kind of bugs much easier to prevent (because the IRQ
layer does the sequencing for you).

Marc (CC'ed) has a more comprehensive view on this than me - I would
like to get to a point where all host bridges uses a consistent
approach for chained IRQ handling and I hope this bug fix can be
a starting point.

Thanks,
Lorenzo

quoted hunk ↗ jump to hunk

 		}
 	}

@@ -619,10 +620,10 @@ static irqreturn_t mtk_pcie_intr_handler(int irq, void *data)
 
 			while ((imsi_status = readl(port->base + PCIE_IMSI_STATUS))) {
 				for_each_set_bit(bit, &imsi_status, MTK_MSI_IRQS_NUM) {
-					/* Clear the MSI */
-					writel(1 << bit, port->base + PCIE_IMSI_STATUS);
 					virq = irq_find_mapping(port->msi_domain, bit);
 					generic_handle_irq(virq);
+					/* Clear the MSI */
+					writel(1 << bit, port->base + PCIE_IMSI_STATUS);
 				}
 			}
 			/* Clear MSI interrupt status */

-- 
2.6.4

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help