Re: [PATCH 5/5] KVM: TDX: Add sub-ioctl KVM_TDX_TERMINATE_VM
From: Chao Gao <hidden>
Date: 2025-08-06 06:06:46
Also in:
kvm, kvmarm, lkml
quoted hunk ↗ jump to hunk
Oof. And as Chao pointed out[*], removing the vm_dead check would allow creating and running vCPUs in a dead VM, which is most definitely not desirable. Squashing the vCPU creation case is easy enough if we keep vm_dead but still generally allow ioctls, and it's probably worth doing that no matter what (to plug the hole where pending vCPU creations could succeed):diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d477a7fda0ae..941d2c32b7dc 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c@@ -4207,6 +4207,11 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id) mutex_lock(&kvm->lock); + if (kvm->vm_dead) { + r = -EIO; + goto unlock_vcpu_destroy; + } +
yes. this addresses my concern.
quoted hunk ↗ jump to hunk
if (kvm_get_vcpu_by_id(kvm, id)) { r = -EEXIST; goto unlock_vcpu_destroy; And then to ensure vCPUs can't do anything, check KVM_REQ_VM_DEAD after acquiring vcpu->mutex.diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 6c07dd423458..883077eee4ce 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c@@ -4433,6 +4433,12 @@ static long kvm_vcpu_ioctl(struct file *filp, if (mutex_lock_killable(&vcpu->mutex)) return -EINTR; + + if (kvm_test_request(KVM_REQ_VM_DEAD, vcpu)) { + r = -EIO; + goto out; + } + switch (ioctl) { case KVM_RUN: { struct pid *oldpid;That should address all TDVPS paths (I hope), and I _think_ would address all MMU-related paths as well? E.g. prefault requires a vCPU. Disallowing (most) vCPU ioctls but not all VM ioctls on vm_dead isn't great ABI (understatement), but I think we need/want the above changes even if we keep the general vm_dead restriction. And given the extremely ad hoc behavior of taking kvm->lock for VM ioctls, trying to enforce vm_dead for "all" VM ioctls seems like a fool's errand. So I'm leaning toward keeping "KVM: Reject ioctls only if the VM is bugged, not simply marked dead" (with a different shortlog+changelog), but keeping vm_dead (and not introducing kvm_tdx.vm_terminated).
Sounds good to me. With kvm_tdx.vm_terminated removed, we should consider adding a comment above the is_hkid_assigned() check in tdx_sept_remove_private_spte() to clarify that !is_hkid_assigned() indicates the guest has been terminated, allowing private pages to be reclaimed directly without zapping.