Re: KVM/arm64: Guest ABI changes do not appear rollback-safe | linux-arm-kernel

quoted

On Fri, Aug 27, 2021 at 12:40 AM Andrew Jones [off-list ref] wrote:
On Thu, Aug 26, 2021 at 06:49:27PM +0000, Oliver Upton wrote:
On Thu, Aug 26, 2021 at 09:37:42AM +0100, Marc Zyngier wrote:
On Wed, 25 Aug 2021 19:14:59 +0100,
Oliver Upton [off-list ref] wrote:
On Wed, Aug 25, 2021 at 8:07 AM Andrew Jones [off-list ref] wrote:
[...]

Thanks for including me Marc. I think you've mentioned all the examples
of why we don't generally expect N+1 -> N migrations to work that I
can think of. While some of the examples like get-reg-list could
eventually be eliminated if we had CPU models to tighten our machine type
state, I think N+1 -> N migrations will always be best effort at most.

I agree with giving userspace control over the exposer of the hypercalls
though. Using pseudo-registers for that purpose rather than a pile of
CAPs also seems reasonable to me.

And, while I don't think this patch is going to proceed, I thought I'd
point out that the opt-out approach doesn't help much with expanding
our migration support unless we require the VMM to be upgraded first.

And, even then, the (N_kern, N+1_vmm) -> (N+1_kern, N_vmm) case won't
work as expected, since the source enforce opt-out, but the destination
won't.
Right, there's going to need to be a fence in both kernel and VMM
versions. Before the fence, you can't rollback with either component.
Once on the other side of the fence, the user may freely migrate
between kernel + VMM combinations.

Also, since the VMM doesn't key off the kernel version, for the
most part N+1 VMMs won't know when they're supposed to opt-out or not,
leaving it to the user to ensure they consider everything. opt-in
usually only needs the user to consider what machine type they want to
launch.
Going the register route will implicitly require opt-out for all old
hypercalls. We exposed them unconditionally to the guest before, and
we must uphold that behavior. The default value for the bitmap will
have those features set. Any hypercalls added after that register
interface will then require explicit opt-in from userspace.
I disagree here. This makes the ABI inconsistent, and means that no
feature can be implemented without changing userspace. If you can deal
with the existing features, you should be able to deal with the next
lot.

With regards to the pseudoregister interface, how would a VMM discover
new bits? From my perspective, you need to have two bitmaps that the
VMM can get at: the set of supported feature bits and the active
bitmap of features for a running guest.
My proposal is that we have a single pseudo-register exposing the list
of implemented by the kernel. Clear the bits you don't want, and write
back the result. As long as you haven't written anything, you have the
full feature set. That's pretty similar to the virtio feature
negotiation.
Ah, yes I agree. Thinking about it more we will not need something
similar to KVM_GET_SUPPORTED_CPUID.

So then, for any register where userspace/KVM need to negotiate
features, the default value will return the maximum feature set that is
supported. If userspace wants to constrain features, read out the
register, make sure everything you want is there, and write it back
blowing away the superfluous bits. Given this should we enforce ordering
on feature registers, such that a VMM can only write to the registers
before a VM is started?
That's a good idea. KVM_REG_ARM64_SVE_VLS has this type of constraint so
we can model the feature register control off that.

Also, Reiji is working on making the identity registers writable for the
sake of feature restriction. The suggested negotiation interface would
be applicable there too, IMO.
This this interesting news. I'll look forward to the posting.

Many thanks to both you and Drew for working this out with me.
Thanks,
drew
Hey folks,

I have some lingering thoughts on this subject since we last spoke and
wanted to discuss.

I'm having a hard time figuring out how a VMM should handle a new
hypercall identity register introduced on a newer kernel. In order to
maintain guest ABI, the VMM would need to know about that register and
zero it when restoring an older guest.

Perhaps instead we could reserve a range of firmware registers as the
'hypercall identity' registers. Implement all of them as RAZ/WI by
default, encouraging userspace to zero these registers away for older
VMs but still allowing an old userspace to pick up new KVM features.
Doing so would align the hypercall identity registers with the feature
ID registers from the architecture.

Thoughts?

--
Thanks,
Oliver

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help