Thread (58 messages) 58 messages, 9 authors, 2021-09-30

Re: [PATCH mlx5-next 2/7] vfio: Add an API to check migration state transition validity

From: Jason Gunthorpe <jgg@ziepe.ca>
Date: 2021-09-30 16:24:51
Also in: kvm, linux-pci, linux-rdma, lkml

On Thu, Sep 30, 2021 at 06:32:07PM +0300, Max Gurtovoy wrote:
quoted
Just prior to open device the vfio pci layer will generate a FLR to
the function so we expect that post open_device has a fresh from reset
fully running device state.
running also mean that the device doesn't have a clue on its internal state
? or running means unfreezed and unquiesced ?
The device just got FLR'd and it should be in a clean state and
operating. Think the VM is booting for the first time.
quoted
quoted
quoted
driver will see RESUMING toggle off so it will trigger a
de-serialization
You mean stop serialization ?
No, I mean it will take all the migration data that has been uploaded
through the migration region and de-serialize it into active device
state.
you should feed the device way before that.
I don't know what this means, when the resuming bit is set the
migration data buffer is wiped and userspace should beging loading
it. When the resuming bit is cleared whatever is in the migration
buffer is deserialized into the current device internal state.

It is the opposite of saving. When the saving bit is set the current
device state is serialized into the migration buffer and userspace and
reads it out.
1. you initialize at  _RUNNING bit == 001b. No problem.

2. state stream arrives, migration SW raise _RESUMING bit. should it be 101b
or 100b ? for now it's 100b. But according to your statement is should be
101b (invalid today) since device state can change. right ?
Running means the device state chanages independently, the controlled
change of the device state via deserializing the migration buffer is
different. Both running and saving commands need running to be zero.

ie commands that are marked invalid in the uapi comment are rejected
at the start - and that is probably the core helper we should provide.
3. Then you should indicate that all the state was serialized to the device
(actually to all the pci devices). 100b mean RESUMING and not RUNNING so
maybe this can say RESUMED and state can't change now ?
State is not loaded into the device until the resuming bit is
cleared. There is no RESUMED state until we incorporate Artem's
proposal for an additional bit eg 1001b - running with DMA master
disabled.

Jason
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help