Re: [PATCH v2 24/43] KVM: VMX: Drop pointless PI.NDST update when blocking
From: Sean Christopherson <seanjc@google.com>
Date: 2021-10-27 14:26:52
Also in:
kvm, kvm-riscv, kvmarm, linux-arm-kernel, linux-riscv, lkml
On Mon, Oct 25, 2021, Paolo Bonzini wrote:
On 09/10/21 04:12, Sean Christopherson wrote:quoted
Don't update Posted Interrupt's NDST, a.k.a. the target pCPU, in the pre-block path, as NDST is guaranteed to be up-to-date. The comment about the vCPU being preempted during the update is simply wrong, as the update path runs with IRQs disabled (from before snapshotting vcpu->cpu, until after the update completes).Right, it didn't as of commit bf9f6ac8d74969690df1485b33b7c238ca9f2269 (when VT-d posted interrupts were introduced). The interrupt disable/enable pair was added in the same commit that motivated the introduction of the sanity checks:
Ya, I found that commit when digging around for different commit in the series and forgot to come back to this changelog. I'll incorporate this info into the next version.
commit 8b306e2f3c41939ea528e6174c88cfbfff893ce1 Author: Paolo Bonzini [off-list ref] Date: Tue Jun 6 12:57:05 2017 +0200 KVM: VMX: avoid double list add with VT-d posted interrupts In some cases, for example involving hot-unplug of assigned devices, pi_post_block can forget to remove the vCPU from the blocked_vcpu_list. When this happens, the next call to pi_pre_block corrupts the list. Fix this in two ways. First, check vcpu->pre_pcpu in pi_pre_block and WARN instead of adding the element twice in the list. Second, always do the list removal in pi_post_block if vcpu->pre_pcpu is set (not -1). The new code keeps interrupts disabled for the whole duration of pi_pre_block/pi_post_block. This is not strictly necessary, but easier to follow. For the same reason, PI.ON is checked only after the cmpxchg, and to handle it we just call the post-block code. This removes duplication of the list removal code. At the time, I didn't notice the now useless NDST update. Paoloquoted
The vCPU can get preempted_before_ the update starts, but not during. And if the vCPU is preempted before, vmx_vcpu_pi_load() is responsible for updating NDST when the vCPU is scheduled back in. In that case, the check against the wakeup vector in vmx_vcpu_pi_load() cannot be true as that would require the notification vector to have been set to the wakeup vector_before_ blocking. Opportunistically switch to using vcpu->cpu for the list/lock lookups, which presumably used pre_pcpu only for some phantom preemption logic.