Thread (30 messages) 30 messages, 9 authors, 2016-07-15

Re: [PATCH v2 0/4] implement vcpu preempted check

From: Peter Zijlstra <peterz@infradead.org>
Date: 2016-07-07 09:42:40
Also in: kvm, linux-s390, linuxppc-dev, lkml
Subsystem: kernel virtual machine for x86 (kvm/x86), the rest, x86 architecture (32-bit and 64-bit) · Maintainers: Sean Christopherson, Paolo Bonzini, Linus Torvalds, Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen

On Thu, Jul 07, 2016 at 04:48:05PM +0800, Wanpeng Li wrote:
2016-07-06 20:28 GMT+08:00 Paolo Bonzini [off-list ref]:
quoted
Hmm, you're right.  We can use bit 0 of struct kvm_steal_time's flags to
indicate that pad[0] is a "VCPU preempted" field; if pad[0] is 1, the
VCPU has been scheduled out since the last time the guest reset the bit.
 The guest can use an xchg to test-and-clear it.  The bit can be
accessed at any time, independent of the version field.
If one vCPU is preempted, and guest check it several times before this
vCPU is scheded in, then the first time we can get "vCPU is
preempted", however, since the field is cleared, the second time we
will get "vCPU is running".

Do you mean we should call record_steal_time() in both kvm_sched_in()
and kvm_sched_out() to record this field? Btw, if we should keep both
vcpu->preempted and kvm_steal_time's "vCPU preempted" field present
simultaneous?
I suspect you want something like so; except this has holes in.

We clear KVM_ST_PAD_PREEMPT before disabling preemption and we set it
after enabling it, this means that if we get preempted in between, the
vcpu is reported as running even though it very much is not.

Fixing that requires much larger surgery.

---
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b2766723c951..117270df43b6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1997,8 +1997,29 @@ static void kvmclock_reset(struct kvm_vcpu *vcpu)
 	vcpu->arch.pv_time_enabled = false;
 }
 
+static void update_steal_time_preempt(struct kvm_vcpu *vcpu)
+{
+	struct kvm_steal_time *st;
+
+	if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
+		return;
+
+	if (unlikely(kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
+		&vcpu->arch.st.steal, sizeof(struct kvm_steal_time))))
+		return;
+
+	st = &vcpu->arch.st.steal;
+
+	st->pad[KVM_ST_PAD_PREEMPT] = 1; /* we've stopped running */
+
+	kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
+		st, sizeof(struct kvm_steal_time));
+}
+
 static void record_steal_time(struct kvm_vcpu *vcpu)
 {
+	struct kvm_steal_time *st;
+
 	if (!(vcpu->arch.st.msr_val & KVM_MSR_ENABLED))
 		return;
 
@@ -2006,29 +2027,34 @@ static void record_steal_time(struct kvm_vcpu *vcpu)
 		&vcpu->arch.st.steal, sizeof(struct kvm_steal_time))))
 		return;
 
-	if (vcpu->arch.st.steal.version & 1)
-		vcpu->arch.st.steal.version += 1;  /* first time write, random junk */
+	st = &vcpu->arch.st.steal;
+
+	if (st->version & 1) {
+		st->flags = KVM_ST_FLAG_PREEMPT;
+		st->version += 1;  /* first time write, random junk */
+	}
 
-	vcpu->arch.st.steal.version += 1;
+	st->version += 1;
 
 	kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
-		&vcpu->arch.st.steal, sizeof(struct kvm_steal_time));
+		st, sizeof(struct kvm_steal_time));
 
 	smp_wmb();
 
-	vcpu->arch.st.steal.steal += current->sched_info.run_delay -
+	st->steal += current->sched_info.run_delay -
 		vcpu->arch.st.last_steal;
 	vcpu->arch.st.last_steal = current->sched_info.run_delay;
+	st->pad[KVM_ST_PAD_PREEMPT] = 0; /* we're about to start running */
 
 	kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
-		&vcpu->arch.st.steal, sizeof(struct kvm_steal_time));
+		st, sizeof(struct kvm_steal_time));
 
 	smp_wmb();
 
-	vcpu->arch.st.steal.version += 1;
+	st->version += 1;
 
 	kvm_write_guest_cached(vcpu->kvm, &vcpu->arch.st.stime,
-		&vcpu->arch.st.steal, sizeof(struct kvm_steal_time));
+		st, sizeof(struct kvm_steal_time));
 }
 
 int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
@@ -6693,6 +6719,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 
 	preempt_enable();
 
+	update_steal_time_preempt(vcpu);
+
 	vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
 
 	/*
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help