Re: [RFC PATCH] vfio: Update/Clarify migration uAPI, add NDMA state
From: Cornelia Huck <cohuck@redhat.com>
Date: 2021-12-14 12:08:57
Also in:
kvm
On Mon, Dec 13 2021, Alex Williamson [off-list ref] wrote:
We do specify that a migration driver has discretion in using the error state for failed transitions, so there are options to simplify error handling. If we look at bit flips, we have: Initial state | Resuming | | Saving | | | Running | | | | Next states with multiple bit flips a) 0, 0, 0 (d) b) 0, 0, 1 (c) (e) c) 0, 1, 0 (b) (e) d) 0, 1, 1 (a) (e) e) 1, 0, 0 (b) (c) (d) f) 1, 0, 1 UNSUPPORTED g) 1, 1, 0 ERROR h) 1, 1, 1 INVALID We specify that we cannot pass through any invalid states during transition, so if I consider each bit to be a discrete operation and map all these multi-bit changes to a sequence of single bit flips, the only requirements are: 1) RESUMING must be cleared before setting SAVING or RUNNING 2) SAVING or RUNNING must be cleared before setting RESUMING I think the basis of your priority scheme comes from that. Ordering of the remaining items is more subtle though, for instance 0 -> SAVING | RUNNING can be broken down as: 0 -> SAVING SAVING -> SAVING | RUNNING vs 0 -> RUNNING RUNNING -> SAVING | RUNNING I'd give preference to enabling logging before running and I believe that holds for transition (e) -> (d) as well.
Agreed.
In the reverse case, SAVING | RUNNING -> 0 SAVING | RUNNING -> RUNNING RUNNING -> 0 vs SAVING | RUNNING -> SAVING SAVING -> 0 This one is more arbitrary. I tend to favor clearing RUNNING to stop the device first, largely because it creates nice symmetry in the resulting algorithm and follows the general principle that you discovered as well, converge towards zero by addressing bit clearing before setting.
That also makes sense to me.
So a valid algorithm would include:
int set_device_state(u32 old, u32 new, bool is_unwind)
{
if (old.RUNNING && !new.RUNNING) {
curr.RUNNING = 0;
if (ERROR) goto unwind;
}
if (old.SAVING && !new.SAVING) {
curr.SAVING = 0;
if (ERROR) goto unwind;
}
if (old.RESUMING && !new.RESUMING) {
curr.RESUMING = 0;
if (ERROR) goto unwind;
}
if (!old.RESUMING && new.RESUMING) {
curr.RESUMING = 1;
if (ERROR) goto unwind;
}
if (!old.SAVING && new.SAVING) {
curr.saving = 1;
if (ERROR) goto unwind;
}
if (!old.RUNNING && new.RUNNING) {
curr.RUNNING = 1;
if (ERROR) goto unwind;
}
return 0;
unwind:
if (!is_unwind) {
ret = set_device_state(curr, old, true);
if (ret) {
curr.raw = ERROR;
return ret;
}
}
return -EIO;
}I've stared at this and scribbled a bit on paper and I think this would work.
And I think that also addresses the claim that we're doomed to untested and complicated error code handling, we unwind by simply swapping the args to our set state function and enter the ERROR state should that recursive call fail.
Nod. As we clear RUNNING as the first thing and would only set RUNNING again as the last step, any transition to ERROR should be save.
I think it would be reasonable for documentation to present similar pseudo code as a recommendation, but ultimately the migration driver needs to come up with something that fits all the requirements.
Yes, something like this should go into Documentation/.