Thread (21 messages) 21 messages, 5 authors, 2017-03-02

[PATCH 3/3] PCI: Xilinx NWL PCIe: Fix Error for multi function device for legacy interrupts.

From: Bharat Kumar Gogada <hidden>
Date: 2017-03-02 08:46:35
Also in: linux-pci, lkml

Hi,

Any one is working on fix for this issue ? 

Regards,
Bharat
-----Original Message-----
From: Bjorn Helgaas [mailto:helgaas at kernel.org]
Sent: Tuesday, September 13, 2016 8:35 PM
To: Marc Zyngier <redacted>
Cc: Bharat Kumar Gogada <redacted>; robh at kernel.org;
bhelgaas at google.com; colin.king at canonical.com; Soren Brinkmann
[off-list ref]; Michal Simek [off-list ref]; arnd at arndb.de;
linux-arm-kernel at lists.infradead.org; linux-pci at vger.kernel.org; linux-
kernel at vger.kernel.org; Ravikiran Gummaluri [off-list ref]
Subject: Re: [PATCH 3/3] PCI: Xilinx NWL PCIe: Fix Error for multi function device
for legacy interrupts.

On Tue, Sep 13, 2016 at 08:41:28AM +0100, Marc Zyngier wrote:
quoted
On 12/09/16 23:02, Bjorn Helgaas wrote:
quoted
On Thu, Sep 01, 2016 at 05:19:55AM +0000, Bharat Kumar Gogada wrote:
quoted
quoted
quoted
quoted
quoted
quoted
Hi Bharat,
quoted
@@ -561,7 +561,7 @@ static int
nwl_pcie_init_irq_domain(struct nwl_pcie
*pcie)
quoted
    }

    pcie->legacy_irq_domain =
irq_domain_add_linear(legacy_intc_node,
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
quoted
-                                                   INTX_NUM,
+                                                   INTX_NUM
+ + 1,
                                                    &legacy_domain_ops,
                                                    pcie);
This feels like the wrong thing to do. You have INTX_NUM irqs,
so the domain allocation should reflect this. On the other
hand, the way the driver currently deals with mappings is
quite broken (consistently adding 1 to
the HW interrupt).
quoted
quoted
Hi Marc,

Without above change I get following crash in kernel while booting.

[    2.441684] error: hwirq 0x4 is too large for dummy

[    2.441694] ------------[ cut here ]------------

[    2.441698] WARNING: at kernel/irq/irqdomain.c:344

[    2.441702] Modules linked in:

[    2.441706]

[    2.441714] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.4.0 #8

[    2.441718] Hardware name: xlnx,zynqmp (DT)

[    2.441723] task: ffffffc071886b80 ti: ffffffc071888000 task.ti:
ffffffc071888000
quoted
[    2.441732] PC is at irq_domain_associate+0x138/0x1c0

[    2.441738] LR is at irq_domain_associate+0x138/0x1c0

In kernel/irq/irqdomain.c function irq_domain_associate

if (WARN(hwirq >= domain->hwirq_max,
                 "error: hwirq 0x%x is too large for %s\n",
(int)hwirq, domain-
name))
quoted
quoted
                return -EINVAL;

Here the hwirq and hwirq_max are equal to 4 without the above
condition
(INTX_NUM + 1) due to which crash is coming.
quoted
This is happening as the legacy interrupts are starting from 1 (INTA).
I understood that. I'm still persisting in saying that you have the wrong
fix.
quoted
quoted
quoted
quoted
quoted
quoted
Your domain should always allocate many interrupts as you have
interrupt sources. These interrupts (hwirq) should be numbered
from 0 to (n-
1).
quoted
Agreed, but here comes the problem the hwirq for legacy
interrupts will start at 0x1 to 0x4 (INTA to INTD) and these
values are as per PCIe specification for legacy interrupts. So
these cannot be numbered from 0. So when 0x4 (INTD) for a
multi-function device comes the crash occurs.
So who provides this hwirq? Who calls irq_domain_associate() with
hwirq set to 4?
PCIe subsystem invokes pcibios_add_device function in
arch/arm64/kernel/pci.c for every pci device.
quoted
quoted
quoted
The purpose of this function is to assign dev->irq using
of_irq_parse_and_map_pci.
quoted
quoted
quoted
of_irq_parse_and_map_pci invokes of_irq_parse_pci where it reads
PCI_INTERRUPT_PIN from configuration space and saves it in parameter of
struct of_phandle_args.
quoted
quoted
quoted
This structure is passed to irq_create_of_mapping where it invokes
irq_create_fwspec_mapping.
quoted
quoted
quoted
irq_create_fwspec_mapping invokes irq_domain_translate and gets
hwirq, here the above saved PCI_INTERRUPT_PIN value is assigned to hwirq
(*hwirq = fwspec->param[0]).
quoted
quoted
quoted
And then using this hwirq irq_create_mapping -> irq_domain_associate
were invoked and mapping is created for virtual irq with this hwirq.
quoted
quoted
quoted
So for any end point PCI_INTERRUPT_PIN value starts from 0x1 to 0x4 and
so hwirq starts from 0x1 to 0x4.
quoted
quoted
quoted
So the values are more generic w.r.t to protocol, that's why hwirq will
range from 0x1 to 0x4.
quoted
quoted
quoted
And then if you check pcie-altera.c they are doing this adding one in their
handler and while creating legacy domain.
quoted
quoted
Is this resolved yet?  Marc, are you happy, or should we iterate on
this again?
Ah, sorry to have dropped the ball on this patch.
No problem, I wasn't making forward progress anyway.
quoted
I guess that given that the infrastructure imposes the hwirq range on
the host drivers, Bharat's approach is the only way (and a number of
other host drivers are already slightly broken). I'll try and have a
look at solving this at the generic level. In the meantime:

Acked-by: Marc Zyngier <redacted>
After looking at this myself, I'm not happy with this either.  It feels like there are
bugs lurking here and we're just hiding one of them.

Here are the callers of irq_domain_add_linear() for legacy INTx in
drivers/pci/host:

  advk_pcie_init_irq_domain    LEGACY_IRQ_NUM   (4)
  dra7xx_pcie_init_irq_domain  4
  ks_dw_pcie_host_init         MAX_LEGACY_IRQS  (4)
  altera_pcie_init_irq_domain  INTX_NUM + 1     (5)
  nwl_pcie_init_irq_domain     INTX_NUM + 1     (5)
  xilinx_pcie_init_irq_domain  4

I think all of these use the of_irq_parse_and_map_pci() path you mentioned, so
if the problem is in the way that path works, I would think these should *all* be
requesting the same number of interrupts in the domain.

I agree with Marc that we should request 4 IRQs, because that's what we need.
If we can't do that for some reason, we ought to at least make all these callers
the same.

Bjorn
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help