Re: KVM/arm64: Guest ABI changes do not appear rollback-safe | linux-arm-kernel

quoted

On Thu, 30 Sep 2021 18:24:23 +0100,
Oliver Upton [off-list ref] wrote:
Hey Marc,

On Thu, Sep 30, 2021 at 12:32 AM Marc Zyngier [off-list ref] wrote:
Hi Oliver,

On Wed, 29 Sep 2021 19:22:05 +0100,
Oliver Upton [off-list ref] wrote:
I have some lingering thoughts on this subject since we last spoke and
wanted to discuss.

I'm having a hard time figuring out how a VMM should handle a new
hypercall identity register introduced on a newer kernel. In order to
maintain guest ABI, the VMM would need to know about that register and
zero it when restoring an older guest.
Just as it would need to be able to discover any new system register
exposed by default, as it happens at all times. Which is why we have a
way to discover all the registers, architected or not.

Perhaps instead we could reserve a range of firmware registers as the
'hypercall identity' registers. Implement all of them as RAZ/WI by
default, encouraging userspace to zero these registers away for older
VMs but still allowing an old userspace to pick up new KVM features.
Doing so would align the hypercall identity registers with the feature
ID registers from the architecture.
The range already exists in the form of the "coprocessor" 0x14. I
don't see the need to expose it as RAZ/WI, however. If userspace
doesn't know about how this pseudo-register works, it won't be able to
program it anyway.

I don't buy the parallel with the ID-regs either. They are RAZ/WI by
default so that they don't UNDEF at runtime. The meaning of a RAZ
id-register is also well defined (feature not implemented), and the
CPU cannot write to them. In a way, the ID-regs *are* the enumeration
mechanism.

Our firmware registers don't follow the same rules. Userspace can
write to them, and there is no such "not implemented" rule (case in
point, PSCI). We also have a separate enumeration mechanism
(GET_ONE_REG), which is (more or less) designed for userspace to find
what is implemented.

For these reasons, I don't immediately see the point of advertising a
set of registers ahead of time, before userspace grows an
understanding of what these registers mean.
Supposing we don't preallocate some hypercall ID registers, how can we
safely migrate a guest from an older kernel to newer kernel? Ideally,
we would preserve the hypercall feature set across the migration which
could work for a while with the first set of registers that get
defined, but whenever a new hypercall firmware register comes along
then the VMM will be clueless to the new ABI.
My expectations were that whenever userspace writes a set of firmware
register, all the defaults are invalidated. For example say that
host-A know about a single hypercall register, while host-B knows
about two. Upon writing to the first register, the host would clear
the set of available services in the second one. If userspace
eventually writes there, the value would stick if valid.

Also, remember that pseudo-registers don't have to be 64bit. We could
define a new class of hypercall-specific registers that would be much
wider, and thus have a larger write granularity (KVM supports anything
from 8 to 2048 bits). This would make it pretty easy to implement the
above.

Fundamentally, I don't think userspace should need a patch to preserve
ABI on a newer kernel. Despite that, it would seem that userspace will
need to learn of any firmware registers that control hypercall
features which come after the initial set that gets proposed. If
KVM_GET_REG_LIST were to disambiguate between ID registers (hypercall,
architectural feature ID registers) from other parts of the vCPU
state, it would be clear to what registers to zero on a newer kernel.
Apologies if it is distracting to mention the feature ID registers
here, but both are on my mind currently and want to make sure there is
some consistency in how features get handled on newer kernels,
architected or not.
The problem I see is that we will always need to grow the HC register
space one way or another, no matter how many we reserve. Your approach
only works if we don't exceed that particular range. Maybe it will
never be a problem, but it remains that this is not scalable.

If we wanted to be safe, we'd reserve the whole of the possible space
as defined by the SMCCC spec. Given that we currently have two HC
spaces (the ARM-controlled one, and the KVM-specific one), the
function space being 16bits in each case, that's 16kB worth of zeroes
that userspace has to save/restore at all times... I'm not overly
enthusiastic.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help