Thread (41 messages) 41 messages, 3 authors, 2021-05-20

Re: [PATCH 07/31] powerpc/xive: Fix xive_irq_set_affinity for MSI

From: Cédric Le Goater <clg@kaod.org>
Date: 2021-05-20 17:25:36

On 5/14/21 10:48 PM, Thomas Gleixner wrote:
On Fri, Apr 30 2021 at 10:03, Cédric Le Goater wrote:
quoted
The MSI affinity is automanaged and it can be set before starting the
associated IRQ.

( Should we simply remove the irqd_is_started() test ? )
If the hardware can handle it properly.

But see:

  cffb717ceb8e ("powerpc/xive: Ensure active irqd when setting affinity")
Thanks for digging. That's a patch from the early days of XIVE support. 
which introduced that condition. It mutters something about migration of
shutdown interrupts:

       [  123.053037264,3] XIVE[ IC 00  ] ISN 2 lead to invalid IVE !
The XIVE driver in OPAL is complaining.

Linux is trying to configure the target of HW IRQ number 2 but OPAL refuses
because it's invalid. The first 16 are reserved (like on Linux).

So it's another problem. 2 could be a value from an "interrupts" property,
giving the INTx number assigned to a PCI device or an OPAL event IRQ 
number leaked into the XIVE domain. Given the low Linux IRQ number that 
might be the latter. 
       [   77.885859] xive: Error -6 reconfiguring irq 17
       [   77.885862] IRQ17: set affinity failed(-6).

Not that I can decode that :)
A device name would help but you have guessed most of it ;)
Non-managed interrupts have the sequence:

      startup()
      set_affinity()

which is historical and an earlier attempt to flip it caused havoc in
some places.

With managed we needed to make sure that the affinity is set correctly
right at start. So it needs to be done the other way round and it turned
out that for MSI this works.

I have no idea, whether that might make the above issue reappear or
not. If so, then we need some extra state to make it work.

The root cause which triggered the problem got fixed, so there should be
no issue _if_ this was specifically related to that CPU unplug case.
I would vote for this option. I will simply remove the irqd_is_started() 
test which looks bogus and do some extra tests on all platforms.

Thanks,

C.


 
quoted
diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index 96737938e8e3..3485baf9ec8c 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -710,7 +710,7 @@ static int xive_irq_set_affinity(struct irq_data *d,
 		return -EINVAL;
 
 	/* Don't do anything if the interrupt isn't started */
-	if (!irqd_is_started(d))
+	if (!irqd_is_started(d) && !irqd_affinity_is_managed(d))
 		return IRQ_SET_MASK_OK;
 
 	/*
Thanks,

        tglx
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help