Re: [PATCH v2 1/3] KVM: x86: Deflect unknown MSR accesses to user space
From: Jim Mattson <hidden>
Date: 2020-07-30 23:53:20
Also in:
kvm, lkml
On Thu, Jul 30, 2020 at 4:08 PM Alexander Graf [off-list ref] wrote:
On 31.07.20 00:42, Jim Mattson wrote:quoted
On Wed, Jul 29, 2020 at 4:59 PM Alexander Graf [off-list ref] wrote:quoted
MSRs are weird. Some of them are normal control registers, such as EFER. Some however are registers that really are model specific, not very interesting to virtualization workloads, and not performance critical. Others again are really just windows into package configuration. Out of these MSRs, only the first category is necessary to implement in kernel space. Rarely accessed MSRs, MSRs that should be fine tunes against certain CPU models and MSRs that contain information on the package level are much better suited for user space to process. However, over time we have accumulated a lot of MSRs that are not the first category, but still handled by in-kernel KVM code. This patch adds a generic interface to handle WRMSR and RDMSR from user space. With this, any future MSR that is part of the latter categories can be handled in user space. Furthermore, it allows us to replace the existing "ignore_msrs" logic with something that applies per-VM rather than on the full system. That way you can run productive VMs in parallel to experimental ones where you don't care about proper MSR handling. Signed-off-by: Alexander Graf <graf@amazon.com>Can we just drop em_wrmsr and em_rdmsr? The in-kernel emulator is already incomplete, and I don't think there is ever a good reason for kvm to emulate RDMSR or WRMSR if the VM-exit was for some other reason (and we shouldn't end up here if the VM-exit was for RDMSR or WRMSR). Am I missing something?On certain combinations of CPUs and guest modes, such as real mode on pre-Nehalem(?) at least, we are running all guest code through the emulator and thus may encounter a RDMSR or WRMSR instruction. I *think* we also do so for big real mode on more modern CPUs, but I'm not 100% sure.
Oh, gag me with a spoon! (BTW, we shouldn't have to emulate big real mode if the CPU supports unrestricted guest mode. If we do, something is probably wrong.)
quoted
You seem to be assuming that the instruction at CS:IP will still be RDMSR (or WRMSR) after returning from userspace, and we will come through kvm_{get,set}_msr_user_space again at the next KVM_RUN. That isn't necessarily the case, for a variety of reasons. I think theDo you have a particular situation in mind where that would not be the case and where we would still want to actually complete an MSR operation after the environment changed?
As far as userspace is concerned, if it has replied with error=0, the instruction has completed and retired. If the kernel executes a different instruction at CS:RIP, the state is certainly inconsistent for WRMSR exits. It would also be inconsistent for RDMSR exits if the RDMSR emulation on the userspace side had any side-effects.
quoted
'completion' of the userspace instruction emulation should be done with the complete_userspace_io [sic] mechanism instead.Hm, that would avoid a roundtrip into guest mode, but add a cycle through the in-kernel emulator. I'm not sure that's a net win quite yet.quoted
I'd really like to see this mechanism apply only in the case of invalid/unknown MSRs, and not for illegal reads/writes as well.Why? Any #GP inducing MSR access will be on the slow path. What's the problem if you get a few more of them in user space that you just bounce back as failing, so they actually do inject a fault?
I'm not concerned about the performance. I think I'm just biased because of what we have today. But since we're planning on dropping that anyway, I take it back. IIRC, the plumbing to make the distinction is a little painful, and I don't want to ask you to go there.