Thread (18 messages) 18 messages, 6 authors, 2020-12-07

Re: [PATCH v2] drivers/virt: vmgenid: add vm generation id driver

From: Dmitry Safonov <hidden>
Date: 2020-11-20 21:18:51
Also in: kvm, linux-doc, linux-s390, lkml, qemu-devel

Hello,

+Cc Eric, Adrian

On 11/19/20 6:36 PM, Alexander Graf wrote:
On 19.11.20 18:38, Mike Rapoport wrote:
quoted
On Thu, Nov 19, 2020 at 01:51:18PM +0100, Alexander Graf wrote:
quoted
On 19.11.20 13:02, Christian Borntraeger wrote:
quoted
On 16.11.20 16:34, Catangiu, Adrian Costin wrote:
quoted
- Background

The VM Generation ID is a feature defined by Microsoft (paper:
http://go.microsoft.com/fwlink/?LinkId=260709) and supported by
multiple hypervisor vendors.

The feature is required in virtualized environments by apps that work
with local copies/caches of world-unique data such as random values,
uuids, monotonically increasing counters, etc.
Such apps can be negatively affected by VM snapshotting when the VM
is either cloned or returned to an earlier point in time.

The VM Generation ID is a simple concept meant to alleviate the issue
by providing a unique ID that changes each time the VM is restored
from a snapshot. The hw provided UUID value can be used to
differentiate between VMs or different generations of the same VM.

- Problem

The VM Generation ID is exposed through an ACPI device by multiple
hypervisor vendors but neither the vendors or upstream Linux have no
default driver for it leaving users to fend for themselves.
[..]
quoted
quoted
The only piece where I'm unsure is how this will interact with CRIU.
To C/R applications that use /dev/vmgenid CRIU need to be aware of it.
Checkpointing and restoring withing the same "VM generation" shouldn't be
a problem, but IMHO, making restore work after genid bump could be
challenging.

Alex, what scenario involving CRIU did you have in mind?
You can in theory run into the same situation with containers that this
patch is solving for virtual machines. You could for example do a
snapshot of a prewarmed Java runtime with CRIU to get full JIT speeds
starting from the first request.

That however means you run into the problem of predictable randomness
again.
quoted
quoted
Can containers emulate ioctls and device nodes?
Containers do not emulate ioctls but they can have /dev/vmgenid inside
the container, so applications can use it the same way as outside the
container.
Hm. I suppose we could add a CAP_ADMIN ioctl interface to /dev/vmgenid
(when container people get to the point of needing it) that sets the
generation to "at least X". That way on restore, you could just call
that with "generation at snapshot"+1.

That also means we need to have this interface available without virtual
machines then though, right?
Sounds like a good idea.
I guess, genvmid can be global on host, rather than per-userns or
per-process for simplicity. Later if somebody will have a bottleneck on
restore when every process on the machine wakes up from read() it could
be virtualized, but doing it now sounds too early.

ioctl() probably should go under
checkpoint_restore_ns_capable(current_user_ns()), rather than
CAP_SYS_ADMIN (I believe it should be safe from DOS as only CRIU should
run with this capability, but worth to document this).

Thanks,
         Dmitry
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help