[bug report] KVM: arm64: vgic-v4: Occasionally issue VMOVP to an unmapped VPE on GICv4.1
From: Kunkun Jiang <hidden>
Date: 2024-09-29 07:20:33
Also in:
kvmarm
Hi all, I found a problem with occasionally issuing VMOVP to an unmapped VPE on GICv4.1. In my test environment, operating an unmapped VPE will generate RAS, so I found this problem. The detailed analysis is as follows. The vgic_v4_teardown() will be executed when VM is destroyed to free the GICv4 data structures. The code is as follows:
/**
* vgic_v4_teardown - Free the GICv4 data structures
* @kvm: Pointer to the VM being destroyed
*/
void vgic_v4_teardown(struct kvm *kvm)
{
struct its_vm *its_vm = &kvm->arch.vgic.its_vm;
int i;
lockdep_assert_held(&kvm->arch.config_lock);
if (!its_vm->vpes)
return;
for (i = 0; i < its_vm->nr_vpes; i++) {
struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, i);
int irq = its_vm->vpes[i]->irq;
irq_clear_status_flags(irq, DB_IRQ_FLAGS);
free_irq(irq, vcpu);
}
its_free_vcpu_irqs(its_vm);
kfree(its_vm->vpes);
its_vm->nr_vpes = 0;
its_vm->vpes = NULL;
}[1] In irq_clear_status_flags(irq, DB_IRQ_FLAGS), the status flags of a doorbell are cleared. DB_IRQ_FLAGS contains IRQ_NO_BALANCING. So after this,the irqbalance.service can schedule the doorbell. [2] In free_irq(), the VPE is unmaped. [3] In its_free_vcpu_irqs(its_vm), unregister_irq_proc() is called to delete the contents in /proc/irq/xx/ of the doorbell. For VPEs in large-scale VM, there is a centain time window between [2] and [3]. The irqbalance.service got a chance to schedule the doorbell. Therefore, the VMOVP is issued to an unmapped VPE. I tried not clearing IRQ_NO_BALANCING and the problem was solved. But it's not clear if there's any other problem with doing so. Thanks, Kunkun Jiang