Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices

[PATCH V2 mlx5-next 00/14] Add mlx5 live migration driver · Yishai Hadas <yishaih@nvidia.com> · 2021-10-19
[PATCH V2 mlx5-next 01/14] PCI/IOV: Add pci_iov_vf_id() to get VF index · Yishai Hadas <yishaih@nvidia.com> · 2021-10-19
[PATCH V2 mlx5-next 02/14] net/mlx5: Reuse exported virtfn index function call · Yishai Hadas <yishaih@nvidia.com> · 2021-10-19
[PATCH V2 mlx5-next 04/14] PCI/IOV: Add pci_iov_get_pf_drvdata() to allow VF reaching the drvdata of a PF · Yishai Hadas <yishaih@nvidia.com> · 2021-10-19
[PATCH V2 mlx5-next 05/14] net/mlx5: Expose APIs to get/put the mlx5 core device · Yishai Hadas <yishaih@nvidia.com> · 2021-10-19
[PATCH V2 mlx5-next 03/14] net/mlx5: Disable SRIOV before PF removal · Yishai Hadas <yishaih@nvidia.com> · 2021-10-19
[PATCH V2 mlx5-next 06/14] vdpa/mlx5: Use mlx5_vf_get_core_dev() to get PF device · Yishai Hadas <yishaih@nvidia.com> · 2021-10-19
Re: [PATCH V2 mlx5-next 06/14] vdpa/mlx5: Use mlx5_vf_get_core_dev() to get PF device · Max Gurtovoy <mgurtovoy@nvidia.com> · 2021-10-19
Re: [PATCH V2 mlx5-next 06/14] vdpa/mlx5: Use mlx5_vf_get_core_dev() to get PF device · Yishai Hadas <yishaih@nvidia.com> · 2021-10-20
[PATCH V2 mlx5-next 07/14] vfio: Fix VFIO_DEVICE_STATE_SET_ERROR macro · Yishai Hadas <yishaih@nvidia.com> · 2021-10-19
[PATCH V2 mlx5-next 08/14] vfio: Add a macro for VFIO_DEVICE_STATE_ERROR · Yishai Hadas <yishaih@nvidia.com> · 2021-10-19
Re: [PATCH V2 mlx5-next 08/14] vfio: Add a macro for VFIO_DEVICE_STATE_ERROR · Alex Williamson <hidden> · 2021-10-19
Re: [PATCH V2 mlx5-next 08/14] vfio: Add a macro for VFIO_DEVICE_STATE_ERROR · Alex Williamson <hidden> · 2021-10-19
Re: [PATCH V2 mlx5-next 08/14] vfio: Add a macro for VFIO_DEVICE_STATE_ERROR · Yishai Hadas <yishaih@nvidia.com> · 2021-10-20
[PATCH V2 mlx5-next 09/14] vfio/pci_core: Make the region->release() function optional · Yishai Hadas <yishaih@nvidia.com> · 2021-10-19
[PATCH V2 mlx5-next 10/14] net/mlx5: Introduce migration bits and structures · Yishai Hadas <yishaih@nvidia.com> · 2021-10-19
[PATCH V2 mlx5-next 11/14] vfio/mlx5: Expose migration commands over mlx5 device · Yishai Hadas <yishaih@nvidia.com> · 2021-10-19
[PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Yishai Hadas <yishaih@nvidia.com> · 2021-10-19
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-10-19
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-19
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-10-19
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-19
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Yishai Hadas <yishaih@nvidia.com> · 2021-10-20
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-10-20
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-20
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-10-20
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Cornelia Huck <cohuck@redhat.com> · 2021-10-21
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-10-21
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-25
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-10-25
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-25
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-10-26
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-26
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-10-26
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-26
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-10-27
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-27
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Cornelia Huck <cohuck@redhat.com> · 2021-10-28
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-29
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Yishai Hadas <yishaih@nvidia.com> · 2021-10-29
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-10-28
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-28
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Cornelia Huck <cohuck@redhat.com> · 2021-10-29
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Yishai Hadas <yishaih@nvidia.com> · 2021-10-29
RE: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Shameerali Kolothum Thodi <hidden> · 2021-10-29
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-29
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-10-29
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-11-01
RE: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Shameerali Kolothum Thodi <hidden> · 2021-11-02
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-11-02
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-11-02
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-11-02
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-11-02
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-11-02
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-11-03
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-11-03
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-11-03
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-11-03
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Cornelia Huck <cohuck@redhat.com> · 2021-11-04
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Cornelia Huck <cohuck@redhat.com> · 2021-11-05
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Cornelia Huck <cohuck@redhat.com> · 2021-11-16
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-11-05
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-11-05
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-11-15
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-11-16
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-11-16
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-11-16
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-11-17
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-11-18
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-11-22
RE: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · "Tian, Kevin" <kevin.tian@intel.com> · 2021-11-08
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-11-08
RE: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · "Tian, Kevin" <kevin.tian@intel.com> · 2021-11-09
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-11-09
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Dr. David Alan Gilbert <hidden> · 2021-10-25
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-10-25
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Dr. David Alan Gilbert <hidden> · 2021-10-25
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-25
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Dr. David Alan Gilbert <hidden> · 2021-10-26
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-26
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-10-26
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-26
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-10-26
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Dr. David Alan Gilbert <hidden> · 2021-10-26
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-26
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Yishai Hadas <yishaih@nvidia.com> · 2021-10-20
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-20
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Yishai Hadas <yishaih@nvidia.com> · 2021-10-21
[PATCH V2 mlx5-next 13/14] vfio/pci: Expose vfio_pci_aer_err_detected() · Yishai Hadas <yishaih@nvidia.com> · 2021-10-19
[PATCH V2 mlx5-next 14/14] vfio/mlx5: Use its own PCI reset_done error handler · Yishai Hadas <yishaih@nvidia.com> · 2021-10-19
Re: [PATCH V2 mlx5-next 14/14] vfio/mlx5: Use its own PCI reset_done error handler · Alex Williamson <hidden> · 2021-10-19
Re: [PATCH V2 mlx5-next 14/14] vfio/mlx5: Use its own PCI reset_done error handler · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-19
Re: [PATCH V2 mlx5-next 14/14] vfio/mlx5: Use its own PCI reset_done error handler · Yishai Hadas <yishaih@nvidia.com> · 2021-10-20
Re: [PATCH V2 mlx5-next 14/14] vfio/mlx5: Use its own PCI reset_done error handler · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-20
Re: [PATCH V2 mlx5-next 14/14] vfio/mlx5: Use its own PCI reset_done error handler · Alex Williamson <hidden> · 2021-10-20
Re: [PATCH V2 mlx5-next 14/14] vfio/mlx5: Use its own PCI reset_done error handler · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-20
Re: [PATCH V2 mlx5-next 14/14] vfio/mlx5: Use its own PCI reset_done error handler · Alex Williamson <hidden> · 2021-10-20
Re: [PATCH V2 mlx5-next 14/14] vfio/mlx5: Use its own PCI reset_done error handler · Yishai Hadas <yishaih@nvidia.com> · 2021-10-21
vfio migration discussions (was: [PATCH V2 mlx5-next 00/14] Add mlx5 live migration driver) · Cornelia Huck <cohuck@redhat.com> · 2021-11-17
Re: vfio migration discussions (was: [PATCH V2 mlx5-next 00/14] Add mlx5 live migration driver) · Jason Gunthorpe <jgg@nvidia.com> · 2021-11-17

From: Alex Williamson <hidden>
Date: 2021-10-20 16:52:46
Also in: kvm, linux-pci

[Cc +dgilbert, +cohuck]

On Wed, 20 Oct 2021 11:28:04 +0300
Yishai Hadas [off-list ref] wrote:

On 10/20/2021 2:04 AM, Jason Gunthorpe wrote:

quoted

On Tue, Oct 19, 2021 at 02:58:56PM -0600, Alex Williamson wrote:

quoted

I think that gives us this table:

|   NDMA   | RESUMING |  SAVING  |  RUNNING |
+----------+----------+----------+----------+ ---
|     X    |     0    |     0    |     0    |  ^
+----------+----------+----------+----------+  |
|     0    |     0    |     0    |     1    |  |
+----------+----------+----------+----------+  |
|     X    |     0    |     1    |     0    |
+----------+----------+----------+----------+  NDMA value is either compatible
|     0    |     0    |     1    |     1    |  to existing behavior or don't
+----------+----------+----------+----------+  care due to redundancy vs
|     X    |     1    |     0    |     0    |  !_RUNNING/INVALID/ERROR
+----------+----------+----------+----------+
|     X    |     1    |     0    |     1    |  |
+----------+----------+----------+----------+  |
|     X    |     1    |     1    |     0    |  |
+----------+----------+----------+----------+  |
|     X    |     1    |     1    |     1    |  v
+----------+----------+----------+----------+ ---
|     1    |     0    |     0    |     1    |  ^
+----------+----------+----------+----------+  Desired new useful cases
|     1    |     0    |     1    |     1    |  v
+----------+----------+----------+----------+ ---

Specifically, rows 1, 3, 5 with NDMA = 1 are valid states a user can
set which are simply redundant to the NDMA = 0 cases.

It seems right

quoted

Row 6 remains invalid due to lack of support for pre-copy (_RESUMING
| _RUNNING) and therefore cannot be set by userspace.  Rows 7 & 8
are error states and cannot be set by userspace.

I wonder, did Yishai's series capture this row 6 restriction? Yishai?


It seems so,  by using the below check which includes the 
!VFIO_DEVICE_STATE_VALID clause.

if (old_state == VFIO_DEVICE_STATE_ERROR ||
         !VFIO_DEVICE_STATE_VALID(state) ||
         (state & ~MLX5VF_SUPPORTED_DEVICE_STATES))
         return -EINVAL;

Which is:

#define VFIO_DEVICE_STATE_VALID(state) \
     (state & VFIO_DEVICE_STATE_RESUMING ? \
     (state & VFIO_DEVICE_STATE_MASK) == VFIO_DEVICE_STATE_RESUMING : 1)

quoted

Like other bits, setting the bit should be effective at the completion
of writing device state.  Therefore the device would need to flush any
outbound DMA queues before returning.

Yes, the device commands are expected to achieve this.

quoted

The question I was really trying to get to though is whether we have a
supportable interface without such an extension.  There's currently
only an experimental version of vfio migration support for PCI devices
in QEMU (afaik),

If I recall this only matters if you have a VM that is causing
migratable devices to interact with each other. So long as the devices
are only interacting with the CPU this extra step is not strictly
needed.

So, single device cases can be fine as-is

IMHO the multi-device case the VMM should probably demand this support
from the migration drivers, otherwise it cannot know if it is safe for
sure.

A config option to override the block if the admin knows there is no
use case to cause devices to interact - eg two NVMe devices without
CMB do not have a useful interaction.

quoted

so it seems like we could make use of the bus-master bit to fill
this gap in QEMU currently, before we claim non-experimental
support, but this new device agnostic extension would be required
for non-PCI device support (and PCI support should adopt it as
available).  Does that sound right?  Thanks,

I don't think the bus master support is really a substitute, tripping
bus master will stop DMA but it will not do so in a clean way and is
likely to be non-transparent to the VM's driver.

The single-device-assigned case is a cleaner restriction, IMHO.

Alternatively we can add the 4th bit and insist that migration drivers
support all the states. I'm just unsure what other HW can do, I get
the feeling people have been designing to the migration description in
the header file for a while and this is a new idea.

I'm wondering if we're imposing extra requirements on the !_RUNNING
state that don't need to be there.  For example, if we can assume that
all devices within a userspace context are !_RUNNING before any of the
devices begin to retrieve final state, then clearing of the _RUNNING
bit becomes the device quiesce point and the beginning of reading
device data is the point at which the device state is frozen and
serialized.  No new states required and essentially works with a slight
rearrangement of the callbacks in this series.  Why can't we do that?

Maybe a clarification of the uAPI spec is sufficient to achieve this,
ex. !_RUNNING devices may still update their internal state machine
based on external access.  Userspace is expected to quiesce all external
access prior to initiating the retrieval of the final device state from
the data section of the migration region.  Failure to do so may result
in inconsistent device state or optionally the device driver may induce
a fault if a quiescent state is not maintained.

Just to be sure,

We refer here to some future functionality support with this extra 4th 
bit but it doesn't enforce any change in the submitted code, right ?

The below code uses the (state & ~MLX5VF_SUPPORTED_DEVICE_STATES) clause 
which fails any usage of a non-supported bit as of this one.

if (old_state == VFIO_DEVICE_STATE_ERROR ||
         !VFIO_DEVICE_STATE_VALID(state) ||
         (state & ~MLX5VF_SUPPORTED_DEVICE_STATES))
         return -EINVAL;

Correct, userspace shouldn't be setting any extra bits unless we
advertise support, such as via a capability or flag.  Drivers need to
continue to sanitize user input to validate yet-to-be-defined bits are
not accepted from userspace or else we risk not being able to define
them later without breaking userspace.  Thanks,

Alex

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help