Thread (78 messages) 78 messages, 9 authors, 2025-11-26

Re: [PATCH v5 09/44] perf/x86: Switch LVTPC to/from mediated PMI vector on guest load/put context

From: Sean Christopherson <seanjc@google.com>
Date: 2025-08-18 15:25:36
Also in: kvm, kvm-riscv, kvmarm, linux-perf-users, linux-riscv, lkml, loongarch

On Mon, Aug 18, 2025, Peter Zijlstra wrote:
On Fri, Aug 15, 2025 at 08:55:25AM -0700, Sean Christopherson wrote:
quoted
On Fri, Aug 15, 2025, Sean Christopherson wrote:
quoted
On Fri, Aug 15, 2025, Peter Zijlstra wrote:
quoted
quoted
diff --git a/kernel/events/core.c b/kernel/events/core.c
index e1df3c3bfc0d..ad22b182762e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6408,6 +6408,8 @@ void perf_load_guest_context(unsigned long data)
 		task_ctx_sched_out(cpuctx->task_ctx, NULL, EVENT_GUEST);
 	}
 
+	arch_perf_load_guest_context(data);
So I still don't understand why this ever needs to reach the generic
code. x86 pmu driver and x86 kvm can surely sort this out inside of x86,
no?
It's definitely possible to handle this entirely within x86, I just don't love
switching the LVTPC without the protection of perf_ctx_lock and perf_ctx_disable().
It's not a sticking point for me if you strongly prefer something like this: 
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 0e5048ae86fa..86b81c217b97 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -1319,7 +1319,9 @@ void kvm_mediated_pmu_load(struct kvm_vcpu *vcpu)
 
        lockdep_assert_irqs_disabled();
 
-       perf_load_guest_context(kvm_lapic_get_reg(vcpu->arch.apic, APIC_LVTPC));
+       perf_load_guest_context();
+
+       perf_load_guest_lvtpc(kvm_lapic_get_reg(vcpu->arch.apic, APIC_LVTPC));
Hmm, an argument for providing a dedicated perf_load_guest_lvtpc() APIs is that
it would allow KVM to handle LVTPC writes in KVM's VM-Exit fastpath, i.e. without
having to do a full put+reload of the guest context.

So if we're confident that switching the host LVTPC outside of
perf_{load,put}_guest_context() is functionally safe, I'm a-ok with it.
Let me see. So the hardware sets Masked when it raises the interrupt.

The interrupt handler clears it from software -- depending on uarch in 3
different places:
 1) right at the start of the PMI
 2) in the middle, right before enabling the PMU (writing global control)
 3) at the end of the PMI

the various changelogs adding that code mention spurious PMIs and
malformed PEBS records.

So the fun all happens when the guest is doing PMI and gets a VM-exit
while still Masked.

At that point, we can come in and completely rewrite the PMU state,
reroute the PMI and enable things again. Then later, we 'restore' the
PMU state, re-set LVTPC masked to the guest interrupt and 'resume'.

What could possibly go wrong :/ Kan, I'm assuming, but not knowing, that
writing all the PMU MSRs is somehow serializing state sufficient to not
cause the above mentioned fails? Specifically, clearing PEBS_ENABLE
should inhibit those malformed PEBS records or something? What if the
host also has PEBS and we don't actually clear the bit?

The current order ensures we rewrite LVTPC when global control is unset;
I think we want to keep that.
Yes, for sure.
While staring at this, I note that perf_load_guest_context() will clear
global ctrl, clear all the counter programming, and re-enable an empty
pmu. Now, an empty PMU should result in global control being zero --
there is nothing run after all.

But then kvm_mediated_pmu_load() writes an explicit 0 again. Perhaps
replace this with asserting it is 0 instead?
Yeah, I like that idea, a lot.  This?

	perf_load_guest_context();

	/*
	 * Sanity check that "loading" guest context disabled all counters, as
	 * modifying the LVTPC while host perf is active will cause explosions,
	 * as will loading event selectors and PMCs with guest values.
	 *
	 * VMX will enable/disable counters at VM-Enter/VM-Exit by atomically
	 * loading PERF_GLOBAL_CONTROL.  SVM effectively performs the switch by
	 * configuring all events to be GUEST_ONLY.
	 */
	WARN_ON_ONCE(rdmsrq(kvm_pmu_ops.PERF_GLOBAL_CTRL));

	perf_load_guest_lvtpc(kvm_lapic_get_reg(vcpu->arch.apic, APIC_LVTPC));
Anyway, this means that moving the LVTPC writing into
kvm_mediated_pmu_load() as you suggest is identical.
perf_load_guest_context() results in global control being 0, we then
assert it is 0, and write LVTPC while it is still 0.
kvm_pmu_load_guest_pmcs() will then frob the MSRs.

OK, so *IF* doing the VM-exit during PMI is sound, this is something
that needs a comment somewhere. 
I'm a bit lost here.  Are you essentially asking if it's ok to take a VM-Exit
while the guest is handling a PMI?  If so, that _has_ to work, because there are
myriad things that can/will trigger a VM-Exit at any point while the guest is
active.
Then going back again, is the easy part, since on the host side, we can never
transition into KVM during a PMI.
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help