Re: [PATCH mlx5-next 2/7] vfio: Add an API to check migration state transition validity
From: Kirti Wankhede <kwankhede@nvidia.com>
Date: 2021-09-24 09:37:53
Also in:
kvm, linux-pci, linux-rdma, lkml
On 9/24/2021 1:14 PM, Shameerali Kolothum Thodi wrote:
quoted
-----Original Message----- From: Max Gurtovoy [mailto:mgurtovoy@nvidia.com] Sent: 23 September 2021 14:56 To: Leon Romanovsky <leon@kernel.org>; Shameerali Kolothum Thodi [off-list ref] Cc: Doug Ledford <redacted>; Jason Gunthorpe [off-list ref]; Yishai Hadas [off-list ref]; Alex Williamson [off-list ref]; Bjorn Helgaas [off-list ref]; David S. Miller [off-list ref]; Jakub Kicinski [off-list ref]; Kirti Wankhede [off-list ref]; kvm@vger.kernel.org; linux-kernel@vger.kernel.org; linux-pci@vger.kernel.org; linux-rdma@vger.kernel.org; netdev@vger.kernel.org; Saeed Mahameed [off-list ref] Subject: Re: [PATCH mlx5-next 2/7] vfio: Add an API to check migration state transition validity On 9/23/2021 2:17 PM, Leon Romanovsky wrote:quoted
On Thu, Sep 23, 2021 at 10:33:10AM +0000, Shameerali Kolothum Thodiwrote:quoted
quoted
quoted
-----Original Message----- From: Leon Romanovsky [mailto:leon@kernel.org] Sent: 22 September 2021 11:39 To: Doug Ledford <redacted>; Jason Gunthorpe[off-list ref]quoted
quoted
quoted
Cc: Yishai Hadas <yishaih@nvidia.com>; Alex Williamson [off-list ref]; Bjorn Helgaas [off-list ref];Davidquoted
quoted
quoted
S. Miller [off-list ref]; Jakub Kicinski [off-list ref]; Kirti Wankhede [off-list ref]; kvm@vger.kernel.org; linux-kernel@vger.kernel.org; linux-pci@vger.kernel.org; linux-rdma@vger.kernel.org; netdev@vger.kernel.org; Saeed Mahameed [off-list ref] Subject: [PATCH mlx5-next 2/7] vfio: Add an API to check migration state transition validity From: Yishai Hadas <yishaih@nvidia.com> Add an API in the core layer to check migration state transition validity as part of a migration flow. The valid transitions follow the expected usage as described in uapi/vfio.h and triggered by QEMU. This ensures that all migration implementations follow a consistent migration state machine. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Kirti Wankhede <kwankhede@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> --- drivers/vfio/vfio.c | 41+++++++++++++++++++++++++++++++++++++++++quoted
quoted
quoted
include/linux/vfio.h | 1 + 2 files changed, 42 insertions(+)diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 3c034fe14ccb..c3ca33e513c8 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c@@ -1664,6 +1664,47 @@ static int vfio_device_fops_release(structinodequoted
quoted
quoted
*inode, struct file *filep) return 0; } +/** + * vfio_change_migration_state_allowed - Checks whether a migrationstatequoted
quoted
quoted
+ * transition is valid. + * @new_state: The new state to move to. + * @old_state: The old state. + * Return: true if the transition is valid. + */ +bool vfio_change_migration_state_allowed(u32 new_state, u32old_state)quoted
quoted
quoted
+{ + enum { MAX_STATE = VFIO_DEVICE_STATE_RESUMING }; + static const u8 vfio_from_state_table[MAX_STATE + 1][MAX_STATE +1] = {quoted
quoted
quoted
+ [VFIO_DEVICE_STATE_STOP] = { + [VFIO_DEVICE_STATE_RUNNING] = 1, + [VFIO_DEVICE_STATE_RESUMING] = 1, + }, + [VFIO_DEVICE_STATE_RUNNING] = { + [VFIO_DEVICE_STATE_STOP] = 1, + [VFIO_DEVICE_STATE_SAVING] = 1, + [VFIO_DEVICE_STATE_SAVING |VFIO_DEVICE_STATE_RUNNING]quoted
quoted
quoted
= 1,Do we need to allow _RESUMING state here or not? As per the "Statetransitions"quoted
quoted
section from uapi/linux/vfio.h,It looks like we missed this state transition. ThanksI'm not sure this state transition is valid. Kirti, When we would like to move from RUNNING to RESUMING ?I guess it depends on what you report as your dev default state. For HiSilicon ACC migration driver, we set the default to _RUNNING. And when the migration starts, the destination side Qemu, set the device state to _RESUMING(vfio_load_state()). From the documentation, it looks like the assumption on default state of the VFIO dev is _RUNNING.
That's right. in QEMU VFIO device state at init is running to maintain backward compatibility since migration support was added later. RUNNING -> RESUMING state transition is valid. Thanks, Kirti
" * 001b => Device running, which is the default state "quoted
Sameerali, can you please re-test and update if you see this transition ?Yes. And if I change the default state to _STOP, then the transition is from _STOP --> _RESUMING. But the documentation on State transitions doesn't have _STOP --> _RESUMING transition as valid. Thanks, Shameerquoted
quoted
quoted
" * 4. To start the resuming phase, the device state should be transitionedfromquoted
quoted
* the _RUNNING to the _RESUMING state." IIRC, I have seen that transition happening on the destination dev whiletesting thequoted
quoted
HiSilicon ACC dev migration. Thanks, Shameerquoted
+ }, + [VFIO_DEVICE_STATE_SAVING] = { + [VFIO_DEVICE_STATE_STOP] = 1, + [VFIO_DEVICE_STATE_RUNNING] = 1, + }, + [VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RUNNING]= {quoted
quoted
quoted
+ [VFIO_DEVICE_STATE_RUNNING] = 1, + [VFIO_DEVICE_STATE_SAVING] = 1, + }, + [VFIO_DEVICE_STATE_RESUMING] = { + [VFIO_DEVICE_STATE_RUNNING] = 1, + [VFIO_DEVICE_STATE_STOP] = 1, + }, + }; + + if (new_state > MAX_STATE || old_state > MAX_STATE) + return false; + + return vfio_from_state_table[old_state][new_state]; +} +EXPORT_SYMBOL_GPL(vfio_change_migration_state_allowed); + static long vfio_device_fops_unl_ioctl(struct file *filep, unsigned int cmd, unsigned long arg) {diff --git a/include/linux/vfio.h b/include/linux/vfio.h index b53a9557884a..e65137a708f1 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h@@ -83,6 +83,7 @@ extern struct vfio_device*vfio_device_get_from_dev(struct device *dev); extern void vfio_device_put(struct vfio_device *device); int vfio_assign_device_set(struct vfio_device *device, void *set_id); +bool vfio_change_migration_state_allowed(u32 new_state, u32old_state);quoted
quoted
quoted
/* events for the backend driver notify callback */ enum vfio_iommu_notify_type { -- 2.31.1