Thread (11 messages) 11 messages, 2 authors, 2021-04-01

Re: [PATCH v2] KVM: arm64: Initialize VCPU mdcr_el2 before loading it

From: Alexandru Elisei <hidden>
Date: 2021-03-30 17:14:45
Also in: kvmarm

Hi Marc,

Thanks for having a look!

On 3/30/21 10:55 AM, Marc Zyngier wrote:
Hi Alex,

On Tue, 23 Mar 2021 18:00:57 +0000,
Alexandru Elisei [off-list ref] wrote:
quoted
When a VCPU is created, the kvm_vcpu struct is initialized to zero in
kvm_vm_ioctl_create_vcpu(). On VHE systems, the first time
vcpu.arch.mdcr_el2 is loaded on hardware is in vcpu_load(), before it is
set to a sensible value in kvm_arm_setup_debug() later in the run loop. The
result is that KVM executes for a short time with MDCR_EL2 set to zero.

This has several unintended consequences:

* Setting MDCR_EL2.HPMN to 0 is constrained unpredictable according to ARM
  DDI 0487G.a, page D13-3820. The behavior specified by the architecture
  in this case is for the PE to behave as if MDCR_EL2.HPMN is set to a
  value less than or equal to PMCR_EL0.N, which means that an unknown
  number of counters are now disabled by MDCR_EL2.HPME, which is zero.

* The host configuration for the other debug features controlled by
  MDCR_EL2 is temporarily lost. This has been harmless so far, as Linux
  doesn't use the other fields, but that might change in the future.

Let's avoid both issues by initializing the VCPU's mdcr_el2 field in
kvm_vcpu_vcpu_first_run_init(), thus making sure that the MDCR_EL2 register
has a consistent value after each vcpu_load().

Signed-off-by: Alexandru Elisei <redacted>
This looks strangely similar to 4942dc6638b0 ("KVM: arm64: Write
arch.mdcr_el2 changes since last vcpu_load on VHE"), just at a
different point. Probably worth a Fixes tag.
This bug is present in the commit you are mentioning, and from what I can tell
it's also present in the commit it's fixing (d5a21bcc2995 ("KVM: arm64: Move
common VHE/non-VHE trap config in separate functions")) - vcpu->arch.mdcr_el2 is
computed in kvm_arm_setup_debug(), which is called after vcpu_load(). My guess is
that this bug is from VHE support was added (or soon after).

I can dig further, how far back in time should I aim for?
quoted
---
Found by code inspection. Based on v5.12-rc4.

Tested on an odroid-c4 with VHE. vcpu->arch.mdcr_el2 is calculated to be
0x4e66. Without this patch, reading MDCR_EL2 after the first vcpu_load() in
kvm_arch_vcpu_ioctl_run() returns 0; with this patch it returns the correct
value, 0xe66 (FEAT_SPE is not implemented by the PE).

This patch was initially part of the KVM SPE series [1], but those patches
haven't seen much activity, so I thought it would be a good idea to send
this patch separately to draw more attention to it.

Changes in v2:
* Moved kvm_arm_vcpu_init_debug() earlier in kvm_vcpu_first_run_init() so
  vcpu->arch.mdcr_el2 is calculated even if kvm_vgic_map_resources() fails.
* Added comment to kvm_arm_setup_mdcr_el2 to explain what testing
  vcpu->guest_debug means.

[1] https://www.spinics.net/lists/kvm-arm/msg42959.html

 arch/arm64/include/asm/kvm_host.h |  1 +
 arch/arm64/kvm/arm.c              |  3 +-
 arch/arm64/kvm/debug.c            | 82 +++++++++++++++++++++----------
 3 files changed, 59 insertions(+), 27 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 3d10e6527f7d..858c2fcfc043 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -713,6 +713,7 @@ static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
 static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
 
 void kvm_arm_init_debug(void);
+void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu);
 void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 7f06ba76698d..7088d8fe7186 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -580,6 +580,8 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
 
 	vcpu->arch.has_run_once = true;
 
+	kvm_arm_vcpu_init_debug(vcpu);
+
 	if (likely(irqchip_in_kernel(kvm))) {
 		/*
 		 * Map the VGIC hardware resources before running a vcpu the
@@ -791,7 +793,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
 		}
 
 		kvm_arm_setup_debug(vcpu);
-
Spurious change?
Definitely, thank you for spotting it.
quoted
 		/**************************************************************
 		 * Enter the guest
 		 */
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index 7a7e425616b5..3626d03354f6 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -68,6 +68,60 @@ void kvm_arm_init_debug(void)
 	__this_cpu_write(mdcr_el2, kvm_call_hyp_ret(__kvm_get_mdcr_el2));
 }
 
+/**
+ * kvm_arm_setup_mdcr_el2 - configure vcpu mdcr_el2 value
+ *
+ * @vcpu:	the vcpu pointer
+ * @host_mdcr:  host mdcr_el2 value
+ *
+ * This ensures we will trap access to:
+ *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
+ *  - Debug ROM Address (MDCR_EL2_TDRA)
+ *  - OS related registers (MDCR_EL2_TDOSA)
+ *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
+ */
+static void kvm_arm_setup_mdcr_el2(struct kvm_vcpu *vcpu, u32 host_mdcr)
+{
+	bool trap_debug = !(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY);
+
+	/*
+	 * This also clears MDCR_EL2_E2PB_MASK to disable guest access
+	 * to the profiling buffer.
+	 */
+	vcpu->arch.mdcr_el2 = host_mdcr & MDCR_EL2_HPMN_MASK;
+	vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
+				MDCR_EL2_TPMS |
+				MDCR_EL2_TPMCR |
+				MDCR_EL2_TDRA |
+				MDCR_EL2_TDOSA);
+
+	/* Is the VM being debugged by userspace? */
+	if (vcpu->guest_debug) {
+		/* Route all software debug exceptions to EL2 */
+		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDE;
+		if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW)
+			trap_debug = true;
+	}
+
+	/* Trap debug register access */
+	if (trap_debug)
+		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
+
+	trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2);
+}
+
+/**
+ * kvm_arm_vcpu_init_debug - setup vcpu debug traps
+ *
+ * @vcpu:	the vcpu pointer
+ *
+ * Set vcpu initial mdcr_el2 value.
+ */
+void kvm_arm_vcpu_init_debug(struct kvm_vcpu *vcpu)
+{
+	kvm_arm_setup_mdcr_el2(vcpu, this_cpu_read(mdcr_el2));
Given that kvm_arm_setup_mdcr_el2() always takes the current host
value for mdcr_el2, why not moving the read into it and be done with
it?
kvm_arm_setup_debug() is called with preemption disabled, and it can use
__this_cpu_read(). kvm_arm_vcpu_init_debug() is called with preemption enabled, so
it must use this_cpu_read(). I wanted to make the distinction because
kvm_arm_setup_debug() is in the run loop.
Also, do we really need an extra wrapper?
I can remove the wrapper and have kvm_arm_setup_mdcr_el2() use this_cpu_read() for
the host's mdcr_el2 value at the cost of a preempt disable/enable in the run loop
when preemption is disabled. If you think that would make the code easier to
follow, I can certainly do that.
quoted
+}
+
 /**
  * kvm_arm_reset_debug_ptr - reset the debug ptr to point to the vcpu state
  */
@@ -83,12 +137,7 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
  * @vcpu:	the vcpu pointer
  *
  * This is called before each entry into the hypervisor to setup any
- * debug related registers. Currently this just ensures we will trap
- * access to:
- *  - Performance monitors (MDCR_EL2_TPM/MDCR_EL2_TPMCR)
- *  - Debug ROM Address (MDCR_EL2_TDRA)
- *  - OS related registers (MDCR_EL2_TDOSA)
- *  - Statistical profiler (MDCR_EL2_TPMS/MDCR_EL2_E2PB)
+ * debug related registers.
  *
  * Additionally, KVM only traps guest accesses to the debug registers if
  * the guest is not actively using them (see the KVM_ARM64_DEBUG_DIRTY
@@ -100,27 +149,14 @@ void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu)
 
 void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 {
-	bool trap_debug = !(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY);
 	unsigned long mdscr, orig_mdcr_el2 = vcpu->arch.mdcr_el2;
 
 	trace_kvm_arm_setup_debug(vcpu, vcpu->guest_debug);
 
-	/*
-	 * This also clears MDCR_EL2_E2PB_MASK to disable guest access
-	 * to the profiling buffer.
-	 */
-	vcpu->arch.mdcr_el2 = __this_cpu_read(mdcr_el2) & MDCR_EL2_HPMN_MASK;
-	vcpu->arch.mdcr_el2 |= (MDCR_EL2_TPM |
-				MDCR_EL2_TPMS |
-				MDCR_EL2_TPMCR |
-				MDCR_EL2_TDRA |
-				MDCR_EL2_TDOSA);
+	kvm_arm_setup_mdcr_el2(vcpu, __this_cpu_read(mdcr_el2));
 
 	/* Is Guest debugging in effect? */
 	if (vcpu->guest_debug) {
-		/* Route all software debug exceptions to EL2 */
-		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDE;
-
 		/* Save guest debug state */
 		save_guest_debug_regs(vcpu);
 
@@ -174,7 +210,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 
 			vcpu->arch.debug_ptr = &vcpu->arch.external_debug_state;
 			vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
-			trap_debug = true;
There is something that slightly worries me here: there is now a
disconnect between flagging debug as dirty and setting the
trapping. And actually, you now check for KVM_ARM64_DEBUG_DIRTY and
set the trap bits *before* setting the dirty bit itself.

Here, I believe you end up with guest/host confusion of breakpoints,
which isn't great. Or did I miss something?
I'm sorry, but I don't understand what you mean. This is my understanding of what
is happening.

Without this patch, trap_debug is set to true and the KVM_ARM64_DEBUG_DIRTY flag
is set if vcpu->guest_debug & KVM_GUESTDBG_USE_HW. Further down, trap debug is
only used when computing mdcr_el2.

With this patch, trap_debug is set to true if vcpu->guest_debug &
KVM_GUESTDBG_USE_HW and it's also used for computing mdcr_el2, but this happens in
kvm_arm_setup_mdcr_el2(), which is called at the start of kvm_arm_setup_debug().
The KVM_ARM_DEBUG_DIRTY flags is still set in kvm_arm_setup_debug() if
vcpu->guest_debug & KVM_GUESTDBG_USE_HW, like before.

The guest never runs with the value computed in kvm_vcpu_first_run_init() unless
it's identical with the value recomputed in kvm_arm_setup_debug().

The only difference I see is that mdcr_el2 is computed at the start of
kvm_arm_setup_debug(). I get the feeling I'm also missing something.

Thanks,

Alex
quoted
 
 			trace_kvm_arm_set_regset("BKPTS", get_num_brps(),
 						&vcpu->arch.debug_ptr->dbg_bcr[0],
@@ -189,10 +224,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 	BUG_ON(!vcpu->guest_debug &&
 		vcpu->arch.debug_ptr != &vcpu->arch.vcpu_debug_state);
 
-	/* Trap debug register access */
-	if (trap_debug)
-		vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
-
 	/* If KDE or MDE are set, perform a full save/restore cycle. */
 	if (vcpu_read_sys_reg(vcpu, MDSCR_EL1) & (DBG_MDSCR_KDE | DBG_MDSCR_MDE))
 		vcpu->arch.flags |= KVM_ARM64_DEBUG_DIRTY;
@@ -201,7 +232,6 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
 	if (has_vhe() && orig_mdcr_el2 != vcpu->arch.mdcr_el2)
 		write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2);
 
-	trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2);
 	trace_kvm_arm_set_dreg32("MDSCR_EL1", vcpu_read_sys_reg(vcpu, MDSCR_EL1));
 }
Thanks,

	M.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help