Thread (12 messages) 12 messages, 5 authors, 2021-05-10

Re: Question on guest enable msi fail when using GICv4/4.1

From: Marc Zyngier <maz@kernel.org>
Date: 2021-05-07 11:05:11
Also in: kvm, kvmarm, linux-pci
Subsystem: the rest, virt lib · Maintainers: Linus Torvalds, Alex Williamson, Paolo Bonzini

On Fri, 07 May 2021 10:58:23 +0100,
Shaokun Zhang [off-list ref] wrote:
Hi Marc,

Thanks for your quick reply.

On 2021/5/7 17:03, Marc Zyngier wrote:
quoted
On Fri, 07 May 2021 06:57:04 +0100,
Shaokun Zhang [off-list ref] wrote:
quoted
[This letter comes from Nianyao Tang]

Hi,

Using GICv4/4.1 and msi capability, guest vf driver requires 3
vectors and enable msi, will lead to guest stuck.
Stuck how?
Guest serial does not response anymore and guest network shutdown.
quoted
quoted
Qemu gets number of interrupts from Multiple Message Capable field
set by guest. This field is aligned to a power of 2(if a function
requires 3 vectors, it initializes it to 2).
So I guess this is a MultiMSI device with 4 vectors, right?
Yes, it can support maximum of 32 msi interrupts, and vf driver only use 3 msi.
quoted
quoted
However, guest driver just sends 3 mapi-cmd to vits and 3 ite
entries is recorded in host.  Vfio initializes msi interrupts using
the number of interrupts 4 provide by qemu.  When it comes to the
4th msi without ite in vits, in irq_bypass_register_producer,
producer and consumer will __connect fail, due to find_ite fail, and
do not resume guest.
Let me rephrase this to check that I understand it:
- The device has 4 vectors
- The guest only create mappings for 3 of them
- VFIO calls kvm_vgic_v4_set_forwarding() for each vector
- KVM doesn't have a mapping for the 4th vector and returns an error
- VFIO disable this 4th vector

Is that correct? If yes, I don't understand why that impacts the guest
at all. From what I can see, vfio_msi_set_vector_signal() just prints
a message on the console and carries on.
function calls:
--> vfio_msi_set_vector_signal
   --> irq_bypass_register_producer
      -->__connect

in __connect, add_producer finally calls kvm_vgic_v4_set_forwarding
and fails to get the 4th mapping. When add_producer fail, it does
not call cons->start, calls kvm_arch_irq_bypass_start and then
kvm_arm_resume_guest.
[+Eric, who wrote the irq_bypass infrastructure.]

Ah, so the guest is actually paused, not in a livelock situation
(which is how I interpreted "stuck").

I think we should handle this case gracefully, as there should be no
expectation that the guest will be using this interrupt. Given that
VFIO seems to be pretty unfazed when a producer fails, I'm temped to
do the same thing and restart the guest.

Also, __disconnect doesn't care about errors, so why should __connect
have this odd behaviour?

Can you please try this? It is completely untested (and I think the
del_consumer call is odd, which is why I've also dropped it).

Eric, what do you think?

Thanks,

	M.
diff --git a/virt/lib/irqbypass.c b/virt/lib/irqbypass.c
index c9bb3957f58a..7e1865e15668 100644
--- a/virt/lib/irqbypass.c
+++ b/virt/lib/irqbypass.c
@@ -40,21 +40,14 @@ static int __connect(struct irq_bypass_producer *prod,
 	if (prod->add_consumer)
 		ret = prod->add_consumer(prod, cons);
 
-	if (ret)
-		goto err_add_consumer;
-
-	ret = cons->add_producer(cons, prod);
-	if (ret)
-		goto err_add_producer;
+	if (!ret)
+		ret = cons->add_producer(cons, prod);
 
 	if (cons->start)
 		cons->start(cons);
 	if (prod->start)
 		prod->start(prod);
-err_add_producer:
-	if (prod->del_consumer)
-		prod->del_consumer(prod, cons);
-err_add_consumer:
+
 	return ret;
 }
 
-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help