Re: [PATCH v2] KVM: arm64: Initialize VCPU mdcr_el2 before loading it
From: Marc Zyngier <maz@kernel.org>
Date: 2021-03-30 20:10:36
Also in:
kvmarm
On Tue, 30 Mar 2021 18:13:07 +0100, Alexandru Elisei [off-list ref] wrote:
Hi Marc, Thanks for having a look! On 3/30/21 10:55 AM, Marc Zyngier wrote:quoted
Hi Alex, On Tue, 23 Mar 2021 18:00:57 +0000, Alexandru Elisei [off-list ref] wrote:quoted
When a VCPU is created, the kvm_vcpu struct is initialized to zero in kvm_vm_ioctl_create_vcpu(). On VHE systems, the first time vcpu.arch.mdcr_el2 is loaded on hardware is in vcpu_load(), before it is set to a sensible value in kvm_arm_setup_debug() later in the run loop. The result is that KVM executes for a short time with MDCR_EL2 set to zero. This has several unintended consequences: * Setting MDCR_EL2.HPMN to 0 is constrained unpredictable according to ARM DDI 0487G.a, page D13-3820. The behavior specified by the architecture in this case is for the PE to behave as if MDCR_EL2.HPMN is set to a value less than or equal to PMCR_EL0.N, which means that an unknown number of counters are now disabled by MDCR_EL2.HPME, which is zero. * The host configuration for the other debug features controlled by MDCR_EL2 is temporarily lost. This has been harmless so far, as Linux doesn't use the other fields, but that might change in the future. Let's avoid both issues by initializing the VCPU's mdcr_el2 field in kvm_vcpu_vcpu_first_run_init(), thus making sure that the MDCR_EL2 register has a consistent value after each vcpu_load(). Signed-off-by: Alexandru Elisei <redacted>This looks strangely similar to 4942dc6638b0 ("KVM: arm64: Write arch.mdcr_el2 changes since last vcpu_load on VHE"), just at a different point. Probably worth a Fixes tag.This bug is present in the commit you are mentioning, and from what I can tell it's also present in the commit it's fixing (d5a21bcc2995 ("KVM: arm64: Move common VHE/non-VHE trap config in separate functions")) - vcpu->arch.mdcr_el2 is computed in kvm_arm_setup_debug(), which is called after vcpu_load(). My guess is that this bug is from VHE support was added (or soon after).
Right. Can you please add a Fixes: tag for the same commit? At least that'd be consistent.
I can dig further, how far back in time should I aim for?quoted
quoted
--- Found by code inspection. Based on v5.12-rc4. Tested on an odroid-c4 with VHE. vcpu->arch.mdcr_el2 is calculated to be 0x4e66. Without this patch, reading MDCR_EL2 after the first vcpu_load() in kvm_arch_vcpu_ioctl_run() returns 0; with this patch it returns the correct value, 0xe66 (FEAT_SPE is not implemented by the PE). This patch was initially part of the KVM SPE series [1], but those patches haven't seen much activity, so I thought it would be a good idea to send this patch separately to draw more attention to it. Changes in v2: * Moved kvm_arm_vcpu_init_debug() earlier in kvm_vcpu_first_run_init() so vcpu->arch.mdcr_el2 is calculated even if kvm_vgic_map_resources() fails. * Added comment to kvm_arm_setup_mdcr_el2 to explain what testing vcpu->guest_debug means. [1] https://www.spinics.net/lists/kvm-arm/msg42959.html arch/arm64/include/asm/kvm_host.h | 1 + arch/arm64/kvm/arm.c | 3 +- arch/arm64/kvm/debug.c | 82 +++++++++++++++++++++---------- 3 files changed, 59 insertions(+), 27 deletions(-)diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 3d10e6527f7d..858c2fcfc043 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h@@ -713,6 +713,7 @@ static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {} static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {} void kvm_arm_init_debug(void); +void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu); void kvm_arm_setup_debug(struct kvm_vcpu *vcpu); void kvm_arm_clear_debug(struct kvm_vcpu *vcpu); void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 7f06ba76698d..7088d8fe7186 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c@@ -580,6 +580,8 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu) vcpu->arch.has_run_once = true; + kvm_arm_vcpu_init_debug(vcpu); + if (likely(irqchip_in_kernel(kvm))) { /* * Map the VGIC hardware resources before running a vcpu the@@ -791,7 +793,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) } kvm_arm_setup_debug(vcpu); -Spurious change?Definitely, thank you for spotting it.quoted
quoted
/************************************************************** * Enter the guest */diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c index 7a7e425616b5..3626d03354f6 100644 --- a/arch/arm64/kvm/debug.c +++ b/arch/arm64/kvm/debug.c@@ -68,6 +68,60 @@ void kvm_arm_init_debug(void) __this_cpu_write(mdcr_el2, kvm_call_hyp_ret(__kvm_get_mdcr_el2)); } +/** + * kvm_arm_setup_mdcr_el2 - configure vcpu mdcr_el2 value + * + * @vcpu: the vcpu pointer + * @host_mdcr: host mdcr_el2 value + * + * This ensures we will trap access to: + * - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR) + * - Debug ROM Address (MDCR_EL2_TDRA) + * - OS related registers (MDCR_EL2_TDOSA) + * - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB) + */ +static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu, u32 host_mdcr) +{ + bool trap_debug = !(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY); + + /* + * This also clears MDCR_EL2_E2PB_MASK to disable guest access + * to the profiling buffer. + */ + vcpu->arch.mdcr_el2 = host_mdcr & MDCR_EL2_HPMN_MASK; + vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM | + MDCR_EL2_TPMS | + MDCR_EL2_TPMCR | + MDCR_EL2_TDRA | + MDCR_EL2_TDOSA); + + /* Is the VM being debugged by userspace? */ + if (vcpu->guest_debug) { + /* Route all software debug exceptions to EL2 */ + vcpu->arch.mdcr_el2 |= MDCR_EL2_TDE; + if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW) + trap_debug = true; + } + + /* Trap debug register access */ + if (trap_debug) + vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA; + + trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2); +} + +/** + * kvm_arm_vcpu_init_debug - setup vcpu debug traps + * + * @vcpu: the vcpu pointer + * + * Set vcpu initial mdcr_el2 value. + */ +void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu) +{ + kvm_arm_setup_mdcr_el2(vcpu, this_cpu_read(mdcr_el2));Given that kvm_arm_setup_mdcr_el2() always takes the current host value for mdcr_el2, why not moving the read into it and be done with it?kvm_arm_setup_debug() is called with preemption disabled, and it can use __this_cpu_read(). kvm_arm_vcpu_init_debug() is called with preemption enabled, so it must use this_cpu_read(). I wanted to make the distinction because kvm_arm_setup_debug() is in the run loop.
I think it would be absolutely fine to make the slow path of kvm_vcpu_first_run_init() run with preempt disabled. This happens so rarely that that it isn't worth thinking about it. Please give it a lockdep run though! ;-)
quoted
Also, do we really need an extra wrapper?I can remove the wrapper and have kvm_arm_setup_mdcr_el2() use this_cpu_read() for the host's mdcr_el2 value at the cost of a preempt disable/enable in the run loop when preemption is disabled. If you think that would make the code easier to follow, I can certainly do that.
As explained above, I'd rather you keep the __this_cpu_read() and make kvm_vcpu_first_run_init() preemption safe. Thanks, M. -- Without deviation from the norm, progress is not possible. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel