Re: [PATCH v2 1/3] KVM: x86: Deflect unknown MSR accesses to user space

From: Jim Mattson <hidden>
Date: 2020-07-30 23:53:20
Also in: kvm, lkml

On Thu, Jul 30, 2020 at 4:08 PM Alexander Graf [off-list ref] wrote:



On 31.07.20 00:42, Jim Mattson wrote:

quoted

On Wed, Jul 29, 2020 at 4:59 PM Alexander Graf [off-list ref] wrote:

quoted

MSRs are weird. Some of them are normal control registers, such as EFER.
Some however are registers that really are model specific, not very
interesting to virtualization workloads, and not performance critical.
Others again are really just windows into package configuration.

Out of these MSRs, only the first category is necessary to implement in
kernel space. Rarely accessed MSRs, MSRs that should be fine tunes against
certain CPU models and MSRs that contain information on the package level
are much better suited for user space to process. However, over time we have
accumulated a lot of MSRs that are not the first category, but still handled
by in-kernel KVM code.

This patch adds a generic interface to handle WRMSR and RDMSR from user
space. With this, any future MSR that is part of the latter categories can
be handled in user space.

Furthermore, it allows us to replace the existing "ignore_msrs" logic with
something that applies per-VM rather than on the full system. That way you
can run productive VMs in parallel to experimental ones where you don't care
about proper MSR handling.

Signed-off-by: Alexander Graf <graf@amazon.com>

Can we just drop em_wrmsr and em_rdmsr? The in-kernel emulator is
already incomplete, and I don't think there is ever a good reason for
kvm to emulate RDMSR or WRMSR if the VM-exit was for some other reason
(and we shouldn't end up here if the VM-exit was for RDMSR or WRMSR).
Am I missing something?

On certain combinations of CPUs and guest modes, such as real mode on
pre-Nehalem(?) at least, we are running all guest code through the
emulator and thus may encounter a RDMSR or WRMSR instruction. I *think*
we also do so for big real mode on more modern CPUs, but I'm not 100% sure.

Oh, gag me with a spoon! (BTW, we shouldn't have to emulate big real
mode if the CPU supports unrestricted guest mode. If we do, something
is probably wrong.)

quoted

You seem to be assuming that the instruction at CS:IP will still be
RDMSR (or WRMSR) after returning from userspace, and we will come
through kvm_{get,set}_msr_user_space again at the next KVM_RUN. That
isn't necessarily the case, for a variety of reasons. I think the

Do you have a particular situation in mind where that would not be the
case and where we would still want to actually complete an MSR operation
after the environment changed?

As far as userspace is concerned, if it has replied with error=0, the
instruction has completed and retired. If the kernel executes a
different instruction at CS:RIP, the state is certainly inconsistent
for WRMSR exits. It would also be inconsistent for RDMSR exits if the
RDMSR emulation on the userspace side had any side-effects.

quoted

'completion' of the userspace instruction emulation should be done
with the complete_userspace_io [sic] mechanism instead.

Hm, that would avoid a roundtrip into guest mode, but add a cycle
through the in-kernel emulator. I'm not sure that's a net win quite yet.

quoted

I'd really like to see this mechanism apply only in the case of
invalid/unknown MSRs, and not for illegal reads/writes as well.

Why? Any #GP inducing MSR access will be on the slow path. What's the
problem if you get a few more of them in user space that you just bounce
back as failing, so they actually do inject a fault?

I'm not concerned about the performance. I think I'm just biased
because of what we have today. But since we're planning on dropping
that anyway, I take it back. IIRC, the plumbing to make the
distinction is a little painful, and I don't want to ask you to go
there.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help