Re: [PATCH RFC v2] vfio: Documentation for the migration region
From: Cornelia Huck <cohuck@redhat.com>
Date: 2021-12-06 18:07:03
Also in:
kvm
On Mon, Dec 06 2021, Jason Gunthorpe [off-list ref] wrote:
On Mon, Dec 06, 2021 at 05:03:00PM +0100, Cornelia Huck wrote:quoted
quoted
If we're writing a specification, that's really a MAY statement, userspace MAY issue a reset to abort the RESUMING process and return the device to RUNNING. They MAY also write the device_state directly, which MAY return an error depending on various factors such as whether data has been written to the migration state and whether that data is complete. If a failed transitions results in an ERROR device_state, the user MUST issue a reset in order to return it to a RUNNING state without closing the interface.Are we actually writing a specification? If yes, we need to be more clear on what is mandatory (MUST), advised (SHOULD), or allowed (MAY). If I look at the current proposal, I'm not sure into which category some of the statements fall.I deliberately didn't use such formal language because this is far from what I'd consider an acceptable spec. It is more words about how things work and some kind of basis for agreement between user and kernel.
We don't really need formal language, but there are too many unclear statements, as the discussion above showed. Therefore my question: What are we actually writing? Even if it is not a formal specification, it still needs to be clear.
Under Linus's "don't break userspace" guideline whatever userspace ends up doing becomes the spec the kernel is wedded to, regardless of what we write down here.
All the more important that we actually agree before this is merged! I don't want choices hidden deep inside the mlx5 driver dictating what other drivers should do, it must be reasonably easy to figure out (including what is mandatory, and what is flexible.)
Which basically means whatever mlx5 and qemu does after we go forward is the definitive spec and we cannot change qemu in a way that is incompatible with mlx5 or introduce a new driver that is incompatible with qemu.
TBH, I'm not too happy with the current QEMU state, either. We need to take a long, hard look first and figure out what we need to do to make the QEMU support non-experimental. We're discussing a complex topic here, and we really don't want to perpetuate an unclear uAPI. This is where my push for more precise statements is coming from.