RE: [PATCH v3 22/22] kvm: x86: Disable interception for IA32_XFD on demand
From: "Tian, Kevin" <kevin.tian@intel.com>
Date: 2021-12-31 09:43:06
Also in:
kvm, linux-kselftest, lkml
From: Tian, Kevin
Sent: Thursday, December 30, 2021 3:05 PM
the new change is like below.
static void handle_nm_fault_irqoff(struct kvm_vcpu *vcpu)
{
/*
* Save xfd_err to guest_fpu before interrupt is enabled, so the
* guest value is not clobbered by the host activity before the guest
* has chance to consume it.
*
* Since trapping #NM is started when xfd write interception is
* disabled, using this flag to guard the saving operation. This
* implies no-op for a non-xfd #NM due to L1 interception.
*
* Queuing exception is done in vmx_handle_exit.
*/
if (vcpu->arch.xfd_no_write_intercept)
rdmsrl(MSR_IA32_XFD_ERR, vcpu->arch.guest_fpu.xfd_err);
}
in the final series it will first check vcpu->arch.guest_fpu.fpstate->xfd
before the disable interception patch is applied and then becomes
the above form, similar to your suggestion on
vmx_update_exception_bitmap().
whether to check msr_bitmap vs. an extra flag is an orthogonal open.
Then:
handle_exception_nmi(struct kvm_vcpu *vcpu)
{
...
if (is_machine_check(intr_info) || is_nmi(intr_info))
return 1; /* handled by handle_exception_nmi_irqoff() */
/*
* Queue the exception here instead of in handle_nm_fault_irqoff().
* This ensures the nested_vmx check is not skipped so vmexit can
* be reflected to L1 (when it intercepts #NM) before reaching this
* point.
*/
if (is_nm_fault(intr_info)) {
kvm_queue_exception(vcpu, NM_VECTOR);
return 1;
}
...
}
Then regarding to test non-AMX nested #NM usage, it might be difficult
to trigger it from modern OS. As commented by Linux #NM handler, it's
expected only for XFD or math emulation when fpu is missing. So we plan
to run a selftest in L1 which sets CR0.TS and then touch fpu registers. and
for L1 kernel we will run two binaries with one trapping #NM and the other
not.
We have verified this scenario and didn't find problem.
Basically the selftest is like below:
guest_code()
{
cr0 = read_cr0();
cr0 |= X86_CR0_TS;
write_cr0(cr0);
asm volatile("fnop");
}
guest_nm_handler()
{
cr0 = read_cr0();
cr0 &= ~X86_CR0_TS;
write_cr0(cr0);
}
We run the selftest in L1 to create a nested scenario.
When L1 intercepts #NM:
(L2) fnop
(L0) #NM vmexit
(L0) reflect a virtual vmexit (reason #NM) to L1
(L1) #NM vmexit
(L1) queue #NM exception to L2
(L2) guest_nm_handler()
(L2) fnop (succeed)
When L1 doesn't intercept #NM:
(L2) fnop
(L0) #NM vmexit
(L0) queue #NM exception to L2
(L2) guest_nm_handler()
(L2) fnop (succeed)
Please suggest if any more test is necessary here.
Thanks
Kevin