Re: [PATCH v3 1/3] KVM: x86: Deflect unknown MSR accesses to user space
From: Alexander Graf <graf@amazon.com>
Date: 2020-08-03 10:08:53
Also in:
kvm, lkml
On 01.08.20 01:36, Jim Mattson wrote:
On Fri, Jul 31, 2020 at 2:50 PM Alexander Graf [off-list ref] wrote:quoted
MSRs are weird. Some of them are normal control registers, such as EFER. Some however are registers that really are model specific, not very interesting to virtualization workloads, and not performance critical. Others again are really just windows into package configuration. Out of these MSRs, only the first category is necessary to implement in kernel space. Rarely accessed MSRs, MSRs that should be fine tunes against certain CPU models and MSRs that contain information on the package level are much better suited for user space to process. However, over time we have accumulated a lot of MSRs that are not the first category, but still handled by in-kernel KVM code. This patch adds a generic interface to handle WRMSR and RDMSR from user space. With this, any future MSR that is part of the latter categories can be handled in user space. Furthermore, it allows us to replace the existing "ignore_msrs" logic with something that applies per-VM rather than on the full system. That way you can run productive VMs in parallel to experimental ones where you don't care about proper MSR handling. Signed-off-by: Alexander Graf <graf@amazon.com> --- v1 -> v2: - s/ETRAP_TO_USER_SPACE/ENOENT/g - deflect all #GP injection events to user space, not just unknown MSRs. That was we can also deflect allowlist errors later - fix emulator case v2 -> v3: - return r if r == X86EMUL_IO_NEEDED - s/KVM_EXIT_RDMSR/KVM_EXIT_X86_RDMSR/g - s/KVM_EXIT_WRMSR/KVM_EXIT_X86_WRMSR/g - Use complete_userspace_io logic instead of reply field - Simplify trapping code --- Documentation/virt/kvm/api.rst | 62 +++++++++++++++++++ arch/x86/include/asm/kvm_host.h | 6 ++ arch/x86/kvm/emulate.c | 18 +++++- arch/x86/kvm/x86.c | 106 ++++++++++++++++++++++++++++++-- include/trace/events/kvm.h | 2 +- include/uapi/linux/kvm.h | 10 +++ 6 files changed, 197 insertions(+), 7 deletions(-)diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 320788f81a05..79c3e2fdfae4 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rstThe new exit reasons should probably be mentioned here (around line 4866): .. note:: For KVM_EXIT_IO, KVM_EXIT_MMIO, KVM_EXIT_OSI, KVM_EXIT_PAPR and KVM_EXIT_EPR the corresponding operations are complete (and guest state is consistent) only after userspace has re-entered the kernel with KVM_RUN. The kernel side will first finish incomplete operations and then check for pending signals. Userspace can re-enter the guest with an unmasked signal pending to complete pending operations.
Great catch, thanks! Updated to also include the two new exit reasons.
Other than that, my remaining comments are all nits. Feel free to ignore them.quoted
+static int kvm_get_msr_user_space(struct kvm_vcpu *vcpu, u32 index)Return bool rather than int?
I'm not a big fan of bool returning APIs unless they have an "is" in
their name. In this case, the most readable path forward would probably
be an enum:
enum kvm_msr_user_space_retval {
KVM_MSR_IN_KERNEL,
KVM_MSR_BOUNCE_TO_USER_SPACE,
};
and then use that in the checks. But that adds a lot of boiler plate for
a fully internal, only a few dozen LOC big API. I don't think it's worth it.
quoted
+{ + if (!vcpu->kvm->arch.user_space_msr_enabled) + return 0; + + vcpu->run->exit_reason = KVM_EXIT_X86_RDMSR; + vcpu->run->msr.error = 0;Should we clear 'pad' in case anyone can think of a reason to use this space to extend the API in the future?
It can't hurt I guess.
quoted
+ vcpu->run->msr.index = index; + vcpu->arch.pending_user_msr = true; + vcpu->arch.complete_userspace_io = complete_emulated_rdmsr;complete_userspace_io could perhaps be renamed to complete_userspace_emulation (in a separate commit).
I think the complicated part of complete_userspace_io is to know it exists and understand how it works. Once you grasp these two bits, the name is just an artifact and IMHO easy enough to apply "beyond I/O".
quoted
+ + return 1; +} + +static int kvm_set_msr_user_space(struct kvm_vcpu *vcpu, u32 index, u64 data)Return bool rather than int?
Same replies as above :). I did get fed up with the amount of duplication though and created a generalized function in v4 that gets called by kvm_get/set_msr_user_space() to ensure that all fields are always set.
quoted
+{ + if (!vcpu->kvm->arch.user_space_msr_enabled) + return 0; + + vcpu->run->exit_reason = KVM_EXIT_X86_WRMSR; + vcpu->run->msr.error = 0;Same question about 'pad' as above.quoted
+ vcpu->run->msr.index = index; + vcpu->run->msr.data = data; + vcpu->arch.pending_user_msr = true; + vcpu->arch.complete_userspace_io = complete_emulated_wrmsr; + + return 1; +} +Reviewed-by: Jim Mattson <redacted>
Thanks a bunch for the review :) Alex Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879