Thread (12 messages) 12 messages, 5 authors, 2021-05-10

Re: Question on guest enable msi fail when using GICv4/4.1

From: Auger Eric <eric.auger@redhat.com>
Date: 2021-05-09 17:02:07
Also in: kvm, kvmarm, linux-pci

Hi,
On 5/7/21 1:02 PM, Marc Zyngier wrote:
On Fri, 07 May 2021 10:58:23 +0100,
Shaokun Zhang [off-list ref] wrote:
quoted
Hi Marc,

Thanks for your quick reply.

On 2021/5/7 17:03, Marc Zyngier wrote:
quoted
On Fri, 07 May 2021 06:57:04 +0100,
Shaokun Zhang [off-list ref] wrote:
quoted
[This letter comes from Nianyao Tang]

Hi,

Using GICv4/4.1 and msi capability, guest vf driver requires 3
vectors and enable msi, will lead to guest stuck.
Stuck how?
Guest serial does not response anymore and guest network shutdown.
quoted
quoted
Qemu gets number of interrupts from Multiple Message Capable field
set by guest. This field is aligned to a power of 2(if a function
requires 3 vectors, it initializes it to 2).
So I guess this is a MultiMSI device with 4 vectors, right?
Yes, it can support maximum of 32 msi interrupts, and vf driver only use 3 msi.
quoted
quoted
However, guest driver just sends 3 mapi-cmd to vits and 3 ite
entries is recorded in host.  Vfio initializes msi interrupts using
the number of interrupts 4 provide by qemu.  When it comes to the
4th msi without ite in vits, in irq_bypass_register_producer,
producer and consumer will __connect fail, due to find_ite fail, and
do not resume guest.
Let me rephrase this to check that I understand it:
- The device has 4 vectors
- The guest only create mappings for 3 of them
- VFIO calls kvm_vgic_v4_set_forwarding() for each vector
- KVM doesn't have a mapping for the 4th vector and returns an error
- VFIO disable this 4th vector

Is that correct? If yes, I don't understand why that impacts the guest
at all. From what I can see, vfio_msi_set_vector_signal() just prints
a message on the console and carries on.
function calls:
--> vfio_msi_set_vector_signal
   --> irq_bypass_register_producer
      -->__connect

in __connect, add_producer finally calls kvm_vgic_v4_set_forwarding
and fails to get the 4th mapping. When add_producer fail, it does
not call cons->start, calls kvm_arch_irq_bypass_start and then
kvm_arm_resume_guest.
[+Eric, who wrote the irq_bypass infrastructure.]

Ah, so the guest is actually paused, not in a livelock situation
(which is how I interpreted "stuck").

I think we should handle this case gracefully, as there should be no
expectation that the guest will be using this interrupt. Given that
VFIO seems to be pretty unfazed when a producer fails, I'm temped to
do the same thing and restart the guest.

Also, __disconnect doesn't care about errors, so why should __connect
have this odd behaviour?
_disconnect() does not care as we should always succeed tearing off
things. del_* ops are void functions. On the opposite we can fail
setting up the bypass.

Effectively
a979a6aa009f ("irqbypass: do not start cons/prod when failed connect")
needs to be reverted.

I agree the kerneldoc comments in linux/irqbypass.h may be improved to
better explain the role of stop/start cbs and warn about their potential
global impact.

wrt the case above, "in __connect, add_producer finally calls
kvm_vgic_v4_set_forwarding and fails to get the 4th mapping", shouldn't
we succeed in that case?

Thanks

Eric
quoted hunk ↗ jump to hunk
Can you please try this? It is completely untested (and I think the
del_consumer call is odd, which is why I've also dropped it).

Eric, what do you think?

Thanks,

	M.
diff --git a/virt/lib/irqbypass.c b/virt/lib/irqbypass.c
index c9bb3957f58a..7e1865e15668 100644
--- a/virt/lib/irqbypass.c
+++ b/virt/lib/irqbypass.c
@@ -40,21 +40,14 @@ static int __connect(struct irq_bypass_producer *prod,
 	if (prod->add_consumer)
 		ret = prod->add_consumer(prod, cons);
 
-	if (ret)
-		goto err_add_consumer;
-
-	ret = cons->add_producer(cons, prod);
-	if (ret)
-		goto err_add_producer;
+	if (!ret)
+		ret = cons->add_producer(cons, prod);
 
 	if (cons->start)
 		cons->start(cons);
 	if (prod->start)
 		prod->start(prod);
-err_add_producer:
-	if (prod->del_consumer)
-		prod->del_consumer(prod, cons);
-err_add_consumer:
+
 	return ret;
 }
 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help