Thread (40 messages) 40 messages, 1 author, 4d ago

[PATCH v2 14/39] KVM: arm64: gic-v5: Set up VMTEs and VPE doorbells

From: Sascha Bischoff <hidden>
Date: 2026-05-21 14:55:21
Also in: kvm, kvmarm
Subsystem: arm64 port (aarch64 architecture), kernel virtual machine for arm64 (kvm/arm64), the rest · Maintainers: Catalin Marinas, Will Deacon, Marc Zyngier, Oliver Upton, Linus Torvalds

A GICv5 VM needs a VM table entry before it can use SPIs and LPIs,
which are backed by the host IRS. The VM table itself is created at
probe time, but each VM still needs to claim and populate one VMTE
before it can use those interrupts.

Allocate a VM ID during vgic_v5_init(). The VM ID is also the index
into the VM table, so allocating it selects the VMTE slot that will be
used for the lifetime of the VM.

Create a per-VM VPE doorbell irq domain, allocate one doorbell
interrupt per vcpu, request the interrupts, and keep the doorbell IRQ
number in the vcpu's GICv5 state. The doorbell handler itself marks
the VPE doorbell as fired, raises KVM_REQ_IRQ_PENDING, and kicks the
target vcpu so that KVM can re-evaluate pending interrupt state.

With the VM ID and doorbells in place, initialise the VMTE backing
state, including the VM descriptor and VPE table. The doorbells have
to exist before making the VMTE valid, as they provide the IRQ-side
conduit used by the IRS commands. Make the VMTE valid via the IRS,
then allocate the VPE state for each vcpu.

Add vgic_v5_teardown() to unwind the state in the reverse order. Make
the VMTE invalid, free the VPE state, release the VMTE backing state,
free the doorbell IRQs and irq domain, and finally release the VM ID
so that the VMTE slot can be reused by a later VM.

On init failure, call the same teardown path so that partially created
state is unwound consistently.

As part of resetting VCPUs mark them as valid in the VM VPE
Table. This informs the IRS that a specific VPE may be made resident,
and without this the IRS will treat the VPE as invalid.

Also introduce a wrapper around the VPE doorbells -
vgic_v5_send_command(). This takes a struct kvm_vcpu pointer, and the
command to run, and triggers the function bound to the command via
that vcpu's doorbell. This is a convenience function to simplify the
code.

Signed-off-by: Sascha Bischoff <redacted>
---
 arch/arm64/kvm/vgic/vgic-v5.c | 154 +++++++++++++++++++++++++++++++---
 1 file changed, 144 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/kvm/vgic/vgic-v5.c b/arch/arm64/kvm/vgic/vgic-v5.c
index 6a312c24d0b31..08f2411c0a134 100644
--- a/arch/arm64/kvm/vgic/vgic-v5.c
+++ b/arch/arm64/kvm/vgic/vgic-v5.c
@@ -401,6 +401,23 @@ static int vgic_v5_irs_set_up_vpe(u16 vm_id, u16 vpe_id,
 	return 0;
 }
 
+static irqreturn_t db_handler(int irq, void *data)
+{
+	struct kvm_vcpu *vcpu = data;
+
+	WRITE_ONCE(vcpu->arch.vgic_cpu.vgic_v5.gicv5_vpe.db_fired, true);
+
+	kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
+	kvm_vcpu_kick(vcpu);
+
+	return IRQ_HANDLED;
+}
+
+static int vgic_v5_send_command(struct kvm_vcpu *vcpu, enum gicv5_vcpu_cmd cmd)
+{
+	return irq_set_vcpu_affinity(vgic_v5_vpe_db(vcpu), &cmd);
+}
+
 static int vgic_v5_db_set_vcpu_affinity(struct irq_data *data, void *vcpu_info)
 {
 	struct vgic_v5_vm *vm = data->domain->host_data;
@@ -575,33 +592,101 @@ void vgic_v5_reset(struct kvm_vcpu *vcpu)
 	 * CPUIF (but potentially fewer in the IRS).
 	 */
 	vcpu->arch.vgic_cpu.num_pri_bits = 5;
+
+	/* Make the VPE valid in the VPET */
+	if (WARN_ON(vgic_v5_send_command(vcpu, VPE_MAKE_VALID)))
+		return;
+}
+
+static void vgic_v5_free_doorbells(struct kvm *kvm, unsigned int nr_dbs)
+{
+	struct vgic_v5_vm *vm = &kvm->arch.vgic.gicv5_vm;
+	struct kvm_vcpu *vcpu;
+	unsigned long i;
+	int db;
+
+	for (i = 0; i < nr_dbs; i++) {
+		vcpu = kvm_get_vcpu(kvm, i);
+		db = vgic_v5_vpe_db(vcpu);
+		if (!db)
+			continue;
+
+		free_irq(db, vcpu);
+		vcpu->arch.vgic_cpu.vgic_v5.gicv5_vpe.db = 0;
+	}
+
+	if (vm->vpe_db_base) {
+		irq_domain_free_irqs(vm->vpe_db_base,
+				     atomic_read(&kvm->online_vcpus));
+		vm->vpe_db_base = 0;
+	}
 }
 
 void vgic_v5_teardown(struct kvm *kvm)
 {
+	struct vgic_dist *dist = &kvm->arch.vgic;
+	struct kvm_vcpu *vcpu, *vcpu0;
+	unsigned long i;
+	int rc;
+
+	/*
+	 * If the VM's ID isn't valid, then we either failed init very early or
+	 * we've been called a second time. Nothing to do here in either case.
+	 */
+	if (kvm->arch.vgic.gicv5_vm.vm_id == VGIC_V5_VM_ID_INVAL)
+		return;
+
+	if (kvm->arch.vgic.gicv5_vm.vmte_allocated) {
+		/* Make the VM invalid  */
+		vcpu0 = kvm_get_vcpu(kvm, 0);
+		rc = vgic_v5_send_command(vcpu0, VMTE_MAKE_INVALID);
+		if (rc)
+			kvm_err("could not make VMTE invalid\n");
+
+		kvm_for_each_vcpu(i, vcpu, kvm) {
+			if (vgic_v5_vmte_free_vpe(vcpu))
+				kvm_err("Failed to free VPE\n");
+		}
+
+		if (vgic_v5_vmte_release(kvm))
+			kvm_err("Failed to release VM 0x%x\n", dist->gicv5_vm.vm_id);
+	}
+
+	vgic_v5_free_doorbells(kvm, atomic_read(&kvm->online_vcpus));
+
 	vgic_v5_teardown_per_vm_domain(&kvm->arch.vgic.gicv5_vm);
+
+	vgic_v5_release_vm_id(kvm);
 }
 
+/*
+ * Claim and populate a VMTE (optionally making a new L2 VMT valid), create VPE
+ * doorbells, allocate VPET and populate for each VPE.
+ *
+ * Note: We do need to put the cart before the horse here. The VPE doorbells are
+ * our conduit for communication with the IRS, which means we need to have those
+ * before making the VMTE valid.
+ *
+ * On failure, we clean up in the teardown path (vgic_v5_teardown()).
+ */
 int vgic_v5_init(struct kvm *kvm)
 {
-	struct kvm_vcpu *vcpu;
-	unsigned long idx;
-	int ret;
+	struct kvm_vcpu *vcpu, *vcpu0;
+	int nr_vcpus, ret = 0;
+	unsigned int db_virq;
+	unsigned long i;
 
-	if (vgic_initialized(kvm))
-		return 0;
+	nr_vcpus = atomic_read(&kvm->online_vcpus);
+	if (nr_vcpus == 0)
+		return -ENODEV;
 
-	kvm_for_each_vcpu(idx, vcpu, kvm) {
+	kvm_for_each_vcpu(i, vcpu, kvm) {
 		if (vcpu_has_nv(vcpu)) {
 			kvm_err("Nested GICv5 VMs are currently unsupported\n");
 			return -EINVAL;
 		}
 	}
 
-	ret = vgic_v5_create_per_vm_domain(kvm);
-	if (ret)
-		return ret;
-
 	/* We only allow userspace to drive the SW_PPI, if it is implemented. */
 	bitmap_zero(kvm->arch.vgic.gicv5_vm.userspace_ppis,
 		    VGIC_V5_NR_PRIVATE_IRQS);
@@ -610,7 +695,56 @@ int vgic_v5_init(struct kvm *kvm)
 		   kvm->arch.vgic.gicv5_vm.userspace_ppis,
 		   ppi_caps.impl_ppi_mask, VGIC_V5_NR_PRIVATE_IRQS);
 
+	ret = vgic_v5_allocate_vm_id(kvm);
+	if (ret)
+		return ret;
+
+	ret = vgic_v5_create_per_vm_domain(kvm);
+	if (ret)
+		goto err;
+
+	db_virq = kvm->arch.vgic.gicv5_vm.vpe_db_base;
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		ret = request_irq(db_virq + i, db_handler, 0, "vcpu", vcpu);
+		if (ret)
+			goto err;
+
+		/* Stash it with the VCPU for easy retrieval */
+		vcpu->arch.vgic_cpu.vgic_v5.gicv5_vpe.db = db_virq + i;
+	}
+
+	/* Populate VMTE (with VPET and VM descriptor) */
+	ret = vgic_v5_vmte_init(kvm);
+	if (ret)
+		goto err;
+
+	/* We pick the first vcpu to make the VMTE valid - any would do */
+	vcpu0 = kvm_get_vcpu(kvm, 0);
+	ret = vgic_v5_send_command(vcpu0, VMTE_MAKE_VALID);
+	if (ret)
+		goto err;
+
+	/* Loop over all VPEs, allocate/populate their data structures */
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		ret = vgic_v5_vmte_alloc_vpe(vcpu);
+		if (ret)
+			goto err;
+	}
+
 	return 0;
+
+err:
+	/*
+	 * Explicitly tear everything down on failure. The teardown function is
+	 * written to handle any partial state we might have, so we don't need
+	 * to do any clean-up first. Teardown will be called a second time on VM
+	 * destruction, but that's fine - it is better to leave things in a
+	 * clean state now, and doubly so because userspace could actually go
+	 * and retry init.
+	 */
+	vgic_v5_teardown(kvm);
+
+	return ret;
 }
 
 int vgic_v5_map_resources(struct kvm *kvm)
-- 
2.34.1
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help