Re: [PATCH v2] KVM: x86/mmu: consider the hva in mmu_notifier retry
From: Sean Christopherson <seanjc@google.com>
Date: 2021-01-27 18:24:30
Also in:
kvm, kvmarm, linux-arm-kernel, lkml
On Wed, Jan 27, 2021, David Stevens wrote:
From: David Stevens <redacted> Track the range being invalidated by mmu_notifier and skip page fault retries if the fault address is not affected by the in-progress invalidation. Handle concurrent invalidations by finding the minimal range which includes all ranges being invalidated. Although the combined range may include unrelated addresses and cannot be shrunk as individual invalidation operations complete, it is unlikely the marginal gains of proper range tracking are worth the additional complexity. The primary benefit of this change is the reduction in the likelihood of extreme latency when handing a page fault due to another thread having been preempted while modifying host virtual addresses. Signed-off-by: David Stevens <redacted> --- v1 -> v2: - improve handling of concurrent invalidation requests by unioning ranges, instead of just giving up and using [0, ULONG_MAX).
Ooh, even better.
- add lockdep check - code comments and formatting arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +- arch/powerpc/kvm/book3s_64_mmu_radix.c | 2 +- arch/x86/kvm/mmu/mmu.c | 16 ++++++++------ arch/x86/kvm/mmu/paging_tmpl.h | 7 ++++--- include/linux/kvm_host.h | 27 +++++++++++++++++++++++- virt/kvm/kvm_main.c | 29 ++++++++++++++++++++++---- 6 files changed, 67 insertions(+), 16 deletions(-)
...
quoted hunk ↗ jump to hunk
@@ -3717,7 +3720,8 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, mmu_seq = vcpu->kvm->mmu_notifier_seq; smp_rmb(); - if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, write, &map_writable)) + if (try_async_pf(vcpu, prefault, gfn, gpa, &pfn, &hva, + write, &map_writable)) return RET_PF_RETRY; if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r))@@ -3725,7 +3729,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, r = RET_PF_RETRY; spin_lock(&vcpu->kvm->mmu_lock); - if (mmu_notifier_retry(vcpu->kvm, mmu_seq)) + if (mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva))
'hva' will be uninitialized at this point if the gfn did not resolve to a memslot, i.e. when handling an MMIO page fault. On the plus side, that's an opportunity for another optimization as there is no need to retry MMIO page faults on mmu_notifier invalidations. Including the attached patch as a preqreq to this will avoid consuming an uninitialized 'hva'.
goto out_unlock; r = make_mmu_pages_available(vcpu); if (r)
...
quoted hunk ↗ jump to hunk
void kvm_release_pfn_clean(kvm_pfn_t pfn); void kvm_release_pfn_dirty(kvm_pfn_t pfn);@@ -1203,6 +1206,28 @@ static inline int mmu_notifier_retry(struct kvm *kvm, unsigned long mmu_seq) return 1; return 0; } + +static inline int mmu_notifier_retry_hva(struct kvm *kvm, + unsigned long mmu_seq, + unsigned long hva) +{ +#ifdef CONFIG_LOCKDEP + lockdep_is_held(&kvm->mmu_lock);
No need to manually do the #ifdef, just use lockdep_assert_held instead of lockdep_is_held.
+#endif + /* + * If mmu_notifier_count is non-zero, then the range maintained by + * kvm_mmu_notifier_invalidate_range_start contains all addresses that + * might be being invalidated. Note that it may include some false + * positives, due to shortcuts when handing concurrent invalidations. + */ + if (unlikely(kvm->mmu_notifier_count) && + kvm->mmu_notifier_range_start <= hva && + hva < kvm->mmu_notifier_range_end)
Uber nit: I find this easier to read if 'hva' is on the left-hand side for both checks, i.e. if (unlikely(kvm->mmu_notifier_count) && hva >= kvm->mmu_notifier_range_start && hva < kvm->mmu_notifier_range_end)
+ return 1; + if (kvm->mmu_notifier_seq != mmu_seq) + return 1; + return 0; +} #endif #ifdef CONFIG_HAVE_KVM_IRQ_ROUTING
Attachments
- 0001-KVM-x86-mmu-Skip-mmu_notifier-check-when-handling-MM.patch [text/x-diff] 2069 bytes · preview