Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback

[PATCH 0/4] KVM, mm: remove the .change_pte() MMU notifier and set_pte_at_notify() · Paolo Bonzini <pbonzini@redhat.com> · 2024-04-05
[PATCH 2/4] KVM: remove unused argument of kvm_handle_hva_range() · Paolo Bonzini <pbonzini@redhat.com> · 2024-04-05
Re: [PATCH 2/4] KVM: remove unused argument of kvm_handle_hva_range() · Philippe Mathieu-Daudé <hidden> · 2024-04-08
[PATCH 1/4] KVM: delete .change_pte MMU notifier callback · Paolo Bonzini <pbonzini@redhat.com> · 2024-04-05
Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback · Anup Patel <anup@brainfault.org> · 2024-04-07
Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback · maobibo <maobibo@loongson.cn> · 2024-04-08
Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback · Michael Ellerman <mpe@ellerman.id.au> · 2024-04-08
Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback · Peter Xu <peterx@redhat.com> · 2024-04-08
Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback · Paolo Bonzini <pbonzini@redhat.com> · 2024-04-11
Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback · Peter Xu <peterx@redhat.com> · 2024-04-11
Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback · David Hildenbrand <hidden> · 2024-04-12
Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback · Will Deacon <will@kernel.org> · 2024-04-12
Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback · Marc Zyngier <maz@kernel.org> · 2024-04-12
Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback · Sean Christopherson <seanjc@google.com> · 2024-04-12
Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback · Marc Zyngier <maz@kernel.org> · 2024-04-13
Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback · Sean Christopherson <seanjc@google.com> · 2024-04-15
Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback · Will Deacon <will@kernel.org> · 2024-04-18
Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback · Sean Christopherson <seanjc@google.com> · 2024-04-18
Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback · Will Deacon <will@kernel.org> · 2024-04-19
Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback · Sean Christopherson <seanjc@google.com> · 2024-04-19
Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback · Sean Christopherson <seanjc@google.com> · 2024-06-12
Re: [PATCH 1/4] KVM: delete .change_pte MMU notifier callback · Sean Christopherson <seanjc@google.com> · 2024-06-12
[PATCH 3/4] mmu_notifier: remove the .change_pte() callback · Paolo Bonzini <pbonzini@redhat.com> · 2024-04-05
Re: [PATCH 3/4] mmu_notifier: remove the .change_pte() callback · David Hildenbrand <hidden> · 2024-04-08
[PATCH 4/4] mm: replace set_pte_at_notify() with just set_pte_at() · Paolo Bonzini <pbonzini@redhat.com> · 2024-04-05
Re: [PATCH 4/4] mm: replace set_pte_at_notify() with just set_pte_at() · Philippe Mathieu-Daudé <hidden> · 2024-04-08
Re: [PATCH 4/4] mm: replace set_pte_at_notify() with just set_pte_at() · David Hildenbrand <hidden> · 2024-04-08
Re: [PATCH 0/4] KVM, mm: remove the .change_pte() MMU notifier and set_pte_at_notify() · Andrew Morton <akpm@linux-foundation.org> · 2024-04-10
Re: [PATCH 0/4] KVM, mm: remove the .change_pte() MMU notifier and set_pte_at_notify() · Paolo Bonzini <pbonzini@redhat.com> · 2024-04-11
Re: [PATCH 0/4] KVM, mm: remove the .change_pte() MMU notifier and set_pte_at_notify() · Marc Zyngier <maz@kernel.org> · 2024-04-12

From: Sean Christopherson <seanjc@google.com>
Date: 2024-04-12 14:54:24
Also in: kvm, kvm-riscv, kvmarm, linux-arm-kernel, linux-mips, linux-mm, linux-perf-users, linuxppc-dev, lkml, loongarch
Subsystem: kernel virtual machine (kvm), the rest · Maintainers: Paolo Bonzini, Linus Torvalds

On Fri, Apr 12, 2024, Marc Zyngier wrote:

On Fri, 12 Apr 2024 11:44:09 +0100, Will Deacon [off-list ref] wrote:

quoted

On Fri, Apr 05, 2024 at 07:58:12AM -0400, Paolo Bonzini wrote:
Also, if you're in the business of hacking the MMU notifier code, it
would be really great to change the .clear_flush_young() callback so
that the architecture could handle the TLB invalidation. At the moment,
the core KVM code invalidates the whole VMID courtesy of 'flush_on_ret'
being set by kvm_handle_hva_range(), whereas we could do a much
lighter-weight and targetted TLBI in the architecture page-table code
when we actually update the ptes for small ranges.

Indeed, and I was looking at this earlier this week as it has a pretty
devastating effect with NV (it blows the shadow S2 for that VMID, with
costly consequences).

In general, it feels like the TLB invalidation should stay with the
code that deals with the page tables, as it has a pretty good idea of
what needs to be invalidated and how -- specially on architectures
that have a HW-broadcast facility like arm64.

Would this be roughly on par with an in-line flush on arm64?  The simpler, more
straightforward solution would be to let architectures override flush_on_ret,
but I would prefer something like the below as x86 can also utilize a range-based
flush when running as a nested hypervisor.

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ff0a20565f90..b65116294efe 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c

@@ -601,6 +601,7 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm,
        struct kvm_gfn_range gfn_range;
        struct kvm_memory_slot *slot;
        struct kvm_memslots *slots;
+       bool need_flush = false;
        int i, idx;
 
        if (WARN_ON_ONCE(range->end <= range->start))

@@ -653,10 +654,22 @@ static __always_inline kvm_mn_ret_t __kvm_handle_hva_range(struct kvm *kvm,
                                        break;
                        }
                        r.ret |= range->handler(kvm, &gfn_range);
+
+                       /*
+                        * Use a precise gfn-based TLB flush when possible, as
+                        * most mmu_notifier events affect a small-ish range.
+                        * Fall back to a full TLB flush if the gfn-based flush
+                        * fails, and don't bother trying the gfn-based flush
+                        * if a full flush is already pending.
+                        */
+                       if (range->flush_on_ret && !need_flush && r.ret &&
+                           kvm_arch_flush_remote_tlbs_range(kvm, gfn_range.start
+                                                            gfn_range.end - gfn_range.start + 1))
+                               need_flush = true;
                }
        }
 
-       if (range->flush_on_ret && r.ret)
+       if (need_flush)
                kvm_flush_remote_tlbs(kvm);
 
        if (r.found_memslot)

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help