Re: [PATCH 10/18] KVM: Move x86's MMU notifier memslot walkers to generic code

[PATCH 00/18] KVM: Consolidate and optimize MMU notifiers · Sean Christopherson <seanjc@google.com> · 2021-03-26
[PATCH 01/18] KVM: x86/mmu: Coalesce TDP MMU TLB flushes when zapping collapsible SPTEs · Sean Christopherson <seanjc@google.com> · 2021-03-26
[PATCH 02/18] KVM: x86/mmu: Move flushing for "slot" handlers to caller for legacy MMU · Sean Christopherson <seanjc@google.com> · 2021-03-26
[PATCH 03/18] KVM: x86/mmu: Coalesce TLB flushes when zapping collapsible SPTEs · Sean Christopherson <seanjc@google.com> · 2021-03-26
[PATCH 05/18] KVM: x86/mmu: Pass address space ID to __kvm_tdp_mmu_zap_gfn_range() · Sean Christopherson <seanjc@google.com> · 2021-03-26
[PATCH 04/18] KVM: x86/mmu: Coalesce TLB flushes across address spaces for gfn range zap · Sean Christopherson <seanjc@google.com> · 2021-03-26
[PATCH 07/18] KVM: x86/mmu: Use leaf-only loop for walking TDP SPTEs when changing SPTE · Sean Christopherson <seanjc@google.com> · 2021-03-26
[PATCH 06/18] KVM: x86/mmu: Pass address space ID to TDP MMU root walkers · Sean Christopherson <seanjc@google.com> · 2021-03-26
[PATCH 08/18] KVM: Move prototypes for MMU notifier callbacks to generic code · Sean Christopherson <seanjc@google.com> · 2021-03-26
[PATCH 09/18] KVM: Move arm64's MMU notifier trace events to generic code · Sean Christopherson <seanjc@google.com> · 2021-03-26
[PATCH 12/18] KVM: MIPS/MMU: Convert to the gfn-based MMU notifier callbacks · Sean Christopherson <seanjc@google.com> · 2021-03-26
Re: [PATCH 12/18] KVM: MIPS/MMU: Convert to the gfn-based MMU notifier callbacks · Paolo Bonzini <pbonzini@redhat.com> · 2021-03-31
[PATCH 10/18] KVM: Move x86's MMU notifier memslot walkers to generic code · Sean Christopherson <seanjc@google.com> · 2021-03-26
Re: [PATCH 10/18] KVM: Move x86's MMU notifier memslot walkers to generic code · Paolo Bonzini <pbonzini@redhat.com> · 2021-03-31
Re: [PATCH 10/18] KVM: Move x86's MMU notifier memslot walkers to generic code · Sean Christopherson <seanjc@google.com> · 2021-03-31
Re: [PATCH 10/18] KVM: Move x86's MMU notifier memslot walkers to generic code · Paolo Bonzini <pbonzini@redhat.com> · 2021-03-31
[PATCH 11/18] KVM: arm64: Convert to the gfn-based MMU notifier callbacks · Sean Christopherson <seanjc@google.com> · 2021-03-26
[PATCH 14/18] KVM: Kill off the old hva-based MMU notifier callbacks · Sean Christopherson <seanjc@google.com> · 2021-03-26
[PATCH 13/18] KVM: PPC: Convert to the gfn-based MMU notifier callbacks · Sean Christopherson <seanjc@google.com> · 2021-03-26
[PATCH 15/18] KVM: Take mmu_lock when handling MMU notifier iff the hva hits a memslot · Sean Christopherson <seanjc@google.com> · 2021-03-26
[PATCH 18/18] KVM: x86/mmu: Drop trace_kvm_age_page() tracepoint · Sean Christopherson <seanjc@google.com> · 2021-03-26
[PATCH 16/18] KVM: Don't take mmu_lock for range invalidation unless necessary · Sean Christopherson <seanjc@google.com> · 2021-03-26
Re: [PATCH 16/18] KVM: Don't take mmu_lock for range invalidation unless necessary · Paolo Bonzini <pbonzini@redhat.com> · 2021-03-31
Re: [PATCH 16/18] KVM: Don't take mmu_lock for range invalidation unless necessary · Paolo Bonzini <pbonzini@redhat.com> · 2021-03-31
Re: [PATCH 16/18] KVM: Don't take mmu_lock for range invalidation unless necessary · Sean Christopherson <seanjc@google.com> · 2021-03-31
Re: [PATCH 16/18] KVM: Don't take mmu_lock for range invalidation unless necessary · Paolo Bonzini <pbonzini@redhat.com> · 2021-03-31
Re: [PATCH 16/18] KVM: Don't take mmu_lock for range invalidation unless necessary · Sean Christopherson <seanjc@google.com> · 2021-03-31
Re: [PATCH 16/18] KVM: Don't take mmu_lock for range invalidation unless necessary · Paolo Bonzini <pbonzini@redhat.com> · 2021-03-31
Re: [PATCH 16/18] KVM: Don't take mmu_lock for range invalidation unless necessary · Sean Christopherson <seanjc@google.com> · 2021-03-31
Re: [PATCH 16/18] KVM: Don't take mmu_lock for range invalidation unless necessary · Sean Christopherson <seanjc@google.com> · 2021-03-31
Re: [PATCH 16/18] KVM: Don't take mmu_lock for range invalidation unless necessary · Paolo Bonzini <pbonzini@redhat.com> · 2021-03-31
Re: [PATCH 16/18] KVM: Don't take mmu_lock for range invalidation unless necessary · Paolo Bonzini <pbonzini@redhat.com> · 2021-03-31
Re: [PATCH 16/18] KVM: Don't take mmu_lock for range invalidation unless necessary · Sean Christopherson <seanjc@google.com> · 2021-03-31
Re: [PATCH 16/18] KVM: Don't take mmu_lock for range invalidation unless necessary · Sean Christopherson <seanjc@google.com> · 2021-03-31
Re: [PATCH 16/18] KVM: Don't take mmu_lock for range invalidation unless necessary · Paolo Bonzini <pbonzini@redhat.com> · 2021-03-31
Re: [PATCH 16/18] KVM: Don't take mmu_lock for range invalidation unless necessary · Sean Christopherson <seanjc@google.com> · 2021-03-31
Re: [PATCH 16/18] KVM: Don't take mmu_lock for range invalidation unless necessary · Paolo Bonzini <pbonzini@redhat.com> · 2021-03-31
[PATCH 17/18] KVM: x86/mmu: Allow yielding during MMU notifier unmap/zap, if possible · Sean Christopherson <seanjc@google.com> · 2021-03-26
Re: [PATCH 00/18] KVM: Consolidate and optimize MMU notifiers · Ben Gardon <hidden> · 2021-03-30
Re: [PATCH 00/18] KVM: Consolidate and optimize MMU notifiers · Paolo Bonzini <pbonzini@redhat.com> · 2021-03-30
Re: [PATCH 00/18] KVM: Consolidate and optimize MMU notifiers · Sean Christopherson <seanjc@google.com> · 2021-03-30
Re: [PATCH 00/18] KVM: Consolidate and optimize MMU notifiers · Paolo Bonzini <pbonzini@redhat.com> · 2021-03-31
Re: [PATCH 00/18] KVM: Consolidate and optimize MMU notifiers · Marc Zyngier <maz@kernel.org> · 2021-03-31
Re: [PATCH 00/18] KVM: Consolidate and optimize MMU notifiers · Paolo Bonzini <pbonzini@redhat.com> · 2021-03-31

From: Sean Christopherson <seanjc@google.com>
Date: 2021-03-31 16:21:31
Also in: kvm, kvmarm, linux-arm-kernel, lkml

On Wed, Mar 31, 2021, Paolo Bonzini wrote:

On 26/03/21 03:19, Sean Christopherson wrote:

quoted

+#ifdef KVM_ARCH_WANT_NEW_MMU_NOTIFIER_APIS
+	kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn);
+#else
  	struct kvm *kvm = mmu_notifier_to_kvm(mn);
  	int idx;
 	trace_kvm_set_spte_hva(address);
	idx = srcu_read_lock(&kvm->srcu);

	KVM_MMU_LOCK(kvm);

	kvm->mmu_notifier_seq++;

	if (kvm_set_spte_hva(kvm, address, pte))
		kvm_flush_remote_tlbs(kvm);

  	KVM_MMU_UNLOCK(kvm);
  	srcu_read_unlock(&kvm->srcu, idx);
+#endif

The kvm->mmu_notifier_seq is missing in the new API side.  I guess you can
add an argument to __kvm_handle_hva_range and handle it also in patch 15
("KVM: Take mmu_lock when handling MMU notifier iff the hva hits a
memslot").

Yikes.  Superb eyes!

That does bring up an oddity I discovered when digging into this.  Every call
to .change_pte() is bookended by .invalidate_range_{start,end}(), i.e. the above
missing kvm->mmu_notifier_seq++ is benign because kvm->mmu_notifier_count is
guaranteed to be non-zero.

I'm also fairly certain it means kvm_set_spte_gfn() is effectively dead code on
_all_ architectures.  x86 and MIPS are clearcut nops if the old SPTE is
not-present, and that's guaranteed due to the prior invalidation.  PPC simply
unmaps the SPTE, which again should be a nop due to the invalidation.  arm64 is
a bit murky, but if I'm reading the code correctly, it's also a nop because
kvm_pgtable_stage2_map() is called without a cache pointer, which I think means
it will map an entry if and only if an existing PTE was found.

I haven't actually tested the above analysis, e.g. by asserting that
kvm->mmu_notifier_count is indeed non-zero.  I'll do that sooner than later.
But, given the shortlog of commit:

  6bdb913f0a70 ("mm: wrap calls to set_pte_at_notify with invalidate_range_start
                 and invalidate_range_end")

I'm fairly confident my analysis is correct.  And if so, it also means that the
whole point of adding .change_pte() in the first place (for KSM, commit
828502d30073, "ksm: add mmu_notifier set_pte_at_notify()"), has since been lost.

When it was originally added, .change_pte() was a pure alternative to
invalidating the entry.

  void __mmu_notifier_change_pte(struct mm_struct *mm, unsigned long address,
                               pte_t pte)
  {
        struct mmu_notifier *mn;
        struct hlist_node *n;

        rcu_read_lock();
        hlist_for_each_entry_rcu(mn, n, &mm->mmu_notifier_mm->list, hlist) {
                if (mn->ops->change_pte)
                        mn->ops->change_pte(mn, mm, address, pte);
                /*
                 * Some drivers don't have change_pte,
                 * so we must call invalidate_page in that case.
                 */
                else if (mn->ops->invalidate_page)
                        mn->ops->invalidate_page(mn, mm, address);
        }
        rcu_read_unlock();
  }

The aforementioned commit 6bdb913f0a70 wrapped set_pte_at_notify() with
invalidate_range_{start,end}() so that .invalidate_page() implementations could
sleep.  But, no one noticed that in doing so, .change_pte() was completely
neutered.

Assuming all of the above is correct, I'm very tempted to rip out .change_pte()
entirely.  It's been dead weight for 8+ years and no one has complained about
KSM+KVM performance (I'd also be curious to know how much performance was gained
by shaving VM-Exits).  As KVM is the only user of .change_pte(), dropping it in
KVM would mean the entire MMU notifier could also go away.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help