Thread (27 messages) 27 messages, 6 authors, 2021-10-15

Re: [PATCH v2] KVM: VMX: Enable Notify VM exit

From: Chenyi Qiang <hidden>
Date: 2021-09-02 09:28:28
Also in: lkml


On 8/3/2021 8:38 AM, Xiaoyao Li wrote:
On 8/2/2021 11:46 PM, Sean Christopherson wrote:
quoted
On Mon, Aug 02, 2021, Xiaoyao Li wrote:
quoted
On 7/31/2021 4:41 AM, Sean Christopherson wrote:
quoted
On Tue, May 25, 2021, Tao Xu wrote:
quoted
   #endif /* __KVM_X86_VMX_CAPS_H */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 4bceb5ca3a89..c0ad01c88dac 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -205,6 +205,10 @@ module_param(ple_window_max, uint, 0444);
   int __read_mostly pt_mode = PT_MODE_SYSTEM;
   module_param(pt_mode, int, S_IRUGO);
+/* Default is 0, less than 0 (for example, -1) disables notify 
window. */
+static int __read_mostly notify_window;
I'm not sure I like the idea of trusting ucode to select an 
appropriate internal
threshold.  Unless the internal threshold is architecturally defined 
to be at
least N nanoseconds or whatever, I think KVM should provide its own 
sane default.
E.g. it's not hard to imagine a scenario where a ucode patch gets 
rolled out that
adjusts the threshold and starts silently degrading guest performance.
You mean when internal threshold gets smaller somehow, and cases
false-positive that leads unexpected VM exit on normal instruction? 
In this
case, we set increase the vmcs.notify_window in KVM.
Not while VMs are running though.
quoted
I think there is no better to avoid this case if ucode changes internal
threshold. Unless KVM's default notify_window is bigger enough.
quoted
Even if the internal threshold isn't architecturally constrained, it 
would be very,
very helpful if Intel could publish the per-uarch/stepping 
thresholds, e.g. to give
us a ballpark idea of how agressive KVM can be before it risks false 
positives.
Even Intel publishes the internal threshold, we still need to provide a
final best_value (internal + vmcs.notify_window). Then what's that 
value?
The ideal value would be high enough to guarantee there are zero false 
positives,
yet low enough to prevent a malicious guest from causing instability 
in the host
by blocking events for an extended duration.  The problem is that 
there's no
magic answer for the threshold at which a blocked event would lead to 
system
instability, and without at least a general idea of the internal value 
there's no
answer at all.

IIRC, SGX instructions have a hard upper bound of 25k cycles before 
they have to
check for pending interrupts, e.g. it's why EINIT is interruptible.  
The 25k cycle
limit is likely a good starting point for the combined minimum.  
That's why I want
to know the internal minimum; if the internal minimum is _guaranteed_ 
to be >25k,
then KVM can be more aggressive with its default value.
OK. I will go internally to see if we can publish the internal threshold.
Hi Sean,

After syncing internally, we know that the internal threshold is not 
architectural but a model-specific value. It will be published in some 
place in future.

On Sapphire Rapids platform, the threshold is 128k. With this in mind, 
is it appropriate to set 0 as the default value of notify_window?
quoted
quoted
If we have an option for final best_value, then I think it's OK to 
just let
vmcs.notify_window = best_value. Then the true final value is 
best_value +
internal.
  - if it's a normal instruction, it should finish within best_value or
best_value + internal. So it makes no difference.
  - if it's an instruction in malicious case, it won't go to next 
instruction
whether wait for best_value or best_value + internal.
...
quoted
quoted
quoted
+
       vmcs_write32(PAGE_FAULT_ERROR_CODE_MASK, 0);
       vmcs_write32(PAGE_FAULT_ERROR_CODE_MATCH, 0);
       vmcs_write32(CR3_TARGET_COUNT, 0);           /* 22.2.1 */
@@ -5642,6 +5653,31 @@ static int handle_bus_lock_vmexit(struct 
kvm_vcpu *vcpu)
       return 0;
   }
+static int handle_notify(struct kvm_vcpu *vcpu)
+{
+    unsigned long exit_qual = vmx_get_exit_qual(vcpu);
+
+    if (!(exit_qual & NOTIFY_VM_CONTEXT_INVALID)) {
What does CONTEXT_INVALID mean?  The ISE doesn't provide any 
information whatsoever.
It means whether the VM context is corrupted and not valid in the VMCS.
Well that's a bit terrifying.  Under what conditions can the VM 
context become
corrupted?  E.g. if the context can be corrupted by an inopportune 
NOTIFY exit,
then KVM needs to be ultra conservative as a false positive could be 
fatal to a
guest.
Short answer is no case will set the VM_CONTEXT_INVALID bit.

VM_CONTEXT_INVALID is so fatal and IMHO it won't be set for any 
inopportune NOTIFY exit.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help