Thread (11 messages) 11 messages, 6 authors, 2020-06-03

Re: [PATCH] irqchip/gic-v3-its: Don't try to move a disabled irq

From: Marc Zyngier <maz@kernel.org>
Date: 2020-05-30 16:49:34
Also in: lkml

Hi Ali,

On Fri, 29 May 2020 12:36:42 +0000
"Saidi, Ali" [off-list ref] wrote:
Hi Marc,
quoted
On May 29, 2020, at 3:33 AM, Marc Zyngier [off-list ref] wrote:

Hi Ali,
  
quoted
On 2020-05-29 02:55, Ali Saidi wrote:
If an interrupt is disabled the ITS driver has sent a discard removing
the DeviceID and EventID from the ITT. After this occurs it can't be
moved to another collection with a MOVI and a command error occurs if
attempted. Before issuing the MOVI command make sure that the IRQ isn't
disabled and change the activate code to try and use the previous
affinity.

Signed-off-by: Ali Saidi <redacted>
---
drivers/irqchip/irq-gic-v3-its.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/drivers/irqchip/irq-gic-v3-its.c
b/drivers/irqchip/irq-gic-v3-its.c
index 124251b0ccba..1235dd9a2fb2 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1540,7 +1540,11 @@ static int its_set_affinity(struct irq_data *d,
const struct cpumask *mask_val,
     /* don't set the affinity when the target cpu is same as current one
*/
     if (cpu != its_dev->event_map.col_map[id]) {
             target_col = &its_dev->its->collections[cpu];
-             its_send_movi(its_dev, target_col, id);
+
+             /* If the IRQ is disabled a discard was sent so don't move */
+             if (!irqd_irq_disabled(d))
+                     its_send_movi(its_dev, target_col, id);
+  
This looks wrong. What you are testing here is whether the interrupt
is masked, not that there isn't a valid translation.  
I’m not exactly sure the correct condition, but what I’m looking for
is interrupts which are deactivated and we have thus sent a discard. 
That looks like IRQD_IRQ_STARTED not being set in this case.
quoted
In the commit message, you're saying that we've issued a discard.
This hints at doing a set_affinity on an interrupt that has been
deactivated (mapping removed). Is that actually the case? If so,
why was it deactivated
the first place?  
This is the case. If we down a NIC, that interface’s MSIs will be
deactivated but remain allocated until the device is unbound from the
driver or the NIC is brought up. 

While stressing down/up a device I’ve found that irqbalance can move
interrupts and you end up with the situation described. The device is
downed, the interrupts are deactivated but still present and then
trying to move one results in sending a MOVI after the DISCARD which
is an error per the GIC spec. 
Not great indeed. But this is not, as far as I can tell, a GIC
driver problem.

The semantic of activate/deactivate (which maps to started/shutdown
in the IRQ code) is that the HW resources for a given interrupt are
only committed when the interrupt is activated. Trying to perform
actions involving the HW on an interrupt that isn't active cannot be
guaranteed to take effect.

I'd rather address it in the core code, by preventing set_affinity (and
potentially others) to take place when the interrupt is not in the
STARTED state. Userspace would get an error, which is perfectly
legitimate, and which it already has to deal with it for plenty of other
reasons.
quoted
  
quoted
             its_dev->event_map.col_map[id] = cpu;
             irq_data_update_effective_affinity(d,
cpumask_of(cpu)); }
@@ -3439,8 +3443,16 @@ static int its_irq_domain_activate(struct
irq_domain *domain,
     if (its_dev->its->numa_node >= 0)
             cpu_mask = cpumask_of_node(its_dev->its->numa_node);

-     /* Bind the LPI to the first possible CPU */
-     cpu = cpumask_first_and(cpu_mask, cpu_online_mask);
+     /* If the cpu set to a different CPU that is still online
use it */
+     cpu = its_dev->event_map.col_map[event];
+
+     cpumask_and(cpu_mask, cpu_mask, cpu_online_mask);
+
+     if (!cpumask_test_cpu(cpu, cpu_mask)) {
+             /* Bind the LPI to the first possible CPU */
+             cpu = cpumask_first(cpu_mask);
+     }
+
     if (cpu >= nr_cpu_ids) {
             if (its_dev->its->flags &
ITS_FLAGS_WORKAROUND_CAVIUM_23144) return -EINVAL;  
So you deactivate an interrupt, do a set_affinity that doesn't issue
a MOVI but preserves the affinity, then reactivate it and hope that
the new mapping will target the "right" CPU.

That seems a bit mad, but I presume this isn't the whole story...  
Doing some experiments it appears as though other interrupts
controllers do preserve affinity across deactivate/activate, so this
is my attempt at doing the same. 
I believe this is only an artefact of these other controllers not
requiring any resource to be committed into the HW (SPIs wouldn't care,
for example).

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help