Thread (28 messages) 28 messages, 5 authors, 2021-07-08

Re: [PATCH 4/4] x86/tsx: Add cmdline tsx=fake to not clear CPUID bits RTM and HLE

From: Jim Mattson <hidden>
Date: 2021-07-07 16:42:49
Also in: kvm, linux-doc, lkml

On Wed, Jul 7, 2021 at 8:09 AM Eduardo Habkost [off-list ref] wrote:
CCing libvir-list, Jiri Denemark, Michal Privoznik, so they are aware
that the definition of "supported CPU features" will probably become a
bit more complex in the future.
Has there ever been a clear definition? Family, model, and stepping,
for instance: are these the only values supported? That would make
cross-platform migration impossible. What about the vendor string? Is
that the only value supported? That would make cross-vendor migration
impossible. For the maximum input value for basic CPUID information
(CPUID.0H:EAX), is that the only value supported, or is it the maximum
value supported? On the various individual feature bits, does a '1'
imply that '0' is also supported, or is '1' the only value supported?
What about the feature bits with reversed polarity (e.g.
CPUID.(EAX=07H,ECX=0):EBX.FDP_EXCPTN_ONLY[bit 6])?

This API has never made sense to me. I have no idea how to interpret
what it is telling me.
On Tue, Jul 6, 2021 at 5:58 PM Paolo Bonzini [off-list ref] wrote:
quoted
On 06/07/21 23:33, Eduardo Habkost wrote:
quoted
On Tue, Jul 6, 2021 at 5:05 PM Paolo Bonzini [off-list ref] wrote:
quoted
It's a bit tricky, because HLE and RTM won't really behave well.  An old
guest that sees RTM=1 might end up retrying and aborting transactions
too much.  So I'm not sure that a QEMU "-cpu host" guest should have HLE
and RTM enabled.
Is the purpose of GET_SUPPORTED_CPUID to return what is supported by
KVM, or to return what "-cpu host" should enable by default? They are
conflicting requirements in this case.
In theory there is GET_EMULATED_CPUID for the former, so it should be
the latter.  In practice neither QEMU nor Libvirt use it; maybe now we
have a good reason to add it, but note that userspace could also check
host RTM_ALWAYS_ABORT.
quoted
Returning HLE=1,RTM=1 in GET_SUPPORTED_CPUID makes existing userspace
take bad decisions until it's updated.

Returning HLE=0,RTM=0 in GET_SUPPORTED_CPUID prevents existing
userspace from resuming existing VMs (despite being technically
possible).

The first option has an easy workaround that doesn't require a
software update (disabling HLE/RTM in the VM configuration). The
second option doesn't have a workaround. I'm inclined towards the
first option.
The default has already been tsx=off for a while though, so checking
either GET_EMULATED_CPUID or host RTM_ALWAYS_ABORT in userspace might
also be feasible for those that are still on tsx=on.
This sounds like a perfect use case for GET_EMULATED_CPUID. My only
concern is breaking existing userspace.

But if this was already broken for a few kernel releases due to
tsx=off being the default, maybe GET_EMULATED_CPUID will be a
reasonable approach.

--
Eduardo
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help