Re: [PATCH v2] drivers/virt: vmgenid: add vm generation id driver
From: Dmitry Safonov <hidden>
Date: 2020-11-20 21:18:51
Also in:
kvm, linux-doc, linux-s390, lkml, qemu-devel
Hello, +Cc Eric, Adrian On 11/19/20 6:36 PM, Alexander Graf wrote:
On 19.11.20 18:38, Mike Rapoport wrote:quoted
On Thu, Nov 19, 2020 at 01:51:18PM +0100, Alexander Graf wrote:quoted
On 19.11.20 13:02, Christian Borntraeger wrote:quoted
On 16.11.20 16:34, Catangiu, Adrian Costin wrote:quoted
- Background The VM Generation ID is a feature defined by Microsoft (paper: http://go.microsoft.com/fwlink/?LinkId=260709) and supported by multiple hypervisor vendors. The feature is required in virtualized environments by apps that work with local copies/caches of world-unique data such as random values, uuids, monotonically increasing counters, etc. Such apps can be negatively affected by VM snapshotting when the VM is either cloned or returned to an earlier point in time. The VM Generation ID is a simple concept meant to alleviate the issue by providing a unique ID that changes each time the VM is restored from a snapshot. The hw provided UUID value can be used to differentiate between VMs or different generations of the same VM. - Problem The VM Generation ID is exposed through an ACPI device by multiple hypervisor vendors but neither the vendors or upstream Linux have no default driver for it leaving users to fend for themselves.
[..]
quoted
quoted
The only piece where I'm unsure is how this will interact with CRIU.To C/R applications that use /dev/vmgenid CRIU need to be aware of it. Checkpointing and restoring withing the same "VM generation" shouldn't be a problem, but IMHO, making restore work after genid bump could be challenging. Alex, what scenario involving CRIU did you have in mind?You can in theory run into the same situation with containers that this patch is solving for virtual machines. You could for example do a snapshot of a prewarmed Java runtime with CRIU to get full JIT speeds starting from the first request. That however means you run into the problem of predictable randomness again.quoted
quoted
Can containers emulate ioctls and device nodes?Containers do not emulate ioctls but they can have /dev/vmgenid inside the container, so applications can use it the same way as outside the container.Hm. I suppose we could add a CAP_ADMIN ioctl interface to /dev/vmgenid (when container people get to the point of needing it) that sets the generation to "at least X". That way on restore, you could just call that with "generation at snapshot"+1. That also means we need to have this interface available without virtual machines then though, right?
Sounds like a good idea.
I guess, genvmid can be global on host, rather than per-userns or
per-process for simplicity. Later if somebody will have a bottleneck on
restore when every process on the machine wakes up from read() it could
be virtualized, but doing it now sounds too early.
ioctl() probably should go under
checkpoint_restore_ns_capable(current_user_ns()), rather than
CAP_SYS_ADMIN (I believe it should be safe from DOS as only CRIU should
run with this capability, but worth to document this).
Thanks,
Dmitry