Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices

[PATCH V2 mlx5-next 00/14] Add mlx5 live migration driver · Yishai Hadas <yishaih@nvidia.com> · 2021-10-19
[PATCH V2 mlx5-next 01/14] PCI/IOV: Add pci_iov_vf_id() to get VF index · Yishai Hadas <yishaih@nvidia.com> · 2021-10-19
[PATCH V2 mlx5-next 02/14] net/mlx5: Reuse exported virtfn index function call · Yishai Hadas <yishaih@nvidia.com> · 2021-10-19
[PATCH V2 mlx5-next 04/14] PCI/IOV: Add pci_iov_get_pf_drvdata() to allow VF reaching the drvdata of a PF · Yishai Hadas <yishaih@nvidia.com> · 2021-10-19
[PATCH V2 mlx5-next 05/14] net/mlx5: Expose APIs to get/put the mlx5 core device · Yishai Hadas <yishaih@nvidia.com> · 2021-10-19
[PATCH V2 mlx5-next 03/14] net/mlx5: Disable SRIOV before PF removal · Yishai Hadas <yishaih@nvidia.com> · 2021-10-19
[PATCH V2 mlx5-next 06/14] vdpa/mlx5: Use mlx5_vf_get_core_dev() to get PF device · Yishai Hadas <yishaih@nvidia.com> · 2021-10-19
Re: [PATCH V2 mlx5-next 06/14] vdpa/mlx5: Use mlx5_vf_get_core_dev() to get PF device · Max Gurtovoy <mgurtovoy@nvidia.com> · 2021-10-19
Re: [PATCH V2 mlx5-next 06/14] vdpa/mlx5: Use mlx5_vf_get_core_dev() to get PF device · Yishai Hadas <yishaih@nvidia.com> · 2021-10-20
[PATCH V2 mlx5-next 07/14] vfio: Fix VFIO_DEVICE_STATE_SET_ERROR macro · Yishai Hadas <yishaih@nvidia.com> · 2021-10-19
[PATCH V2 mlx5-next 08/14] vfio: Add a macro for VFIO_DEVICE_STATE_ERROR · Yishai Hadas <yishaih@nvidia.com> · 2021-10-19
Re: [PATCH V2 mlx5-next 08/14] vfio: Add a macro for VFIO_DEVICE_STATE_ERROR · Alex Williamson <hidden> · 2021-10-19
Re: [PATCH V2 mlx5-next 08/14] vfio: Add a macro for VFIO_DEVICE_STATE_ERROR · Alex Williamson <hidden> · 2021-10-19
Re: [PATCH V2 mlx5-next 08/14] vfio: Add a macro for VFIO_DEVICE_STATE_ERROR · Yishai Hadas <yishaih@nvidia.com> · 2021-10-20
[PATCH V2 mlx5-next 09/14] vfio/pci_core: Make the region->release() function optional · Yishai Hadas <yishaih@nvidia.com> · 2021-10-19
[PATCH V2 mlx5-next 10/14] net/mlx5: Introduce migration bits and structures · Yishai Hadas <yishaih@nvidia.com> · 2021-10-19
[PATCH V2 mlx5-next 11/14] vfio/mlx5: Expose migration commands over mlx5 device · Yishai Hadas <yishaih@nvidia.com> · 2021-10-19
[PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Yishai Hadas <yishaih@nvidia.com> · 2021-10-19
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-10-19
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-19
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-10-19
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-19
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Yishai Hadas <yishaih@nvidia.com> · 2021-10-20
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-10-20
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-20
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-10-20
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Cornelia Huck <cohuck@redhat.com> · 2021-10-21
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-10-21
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-25
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-10-25
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-25
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-10-26
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-26
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-10-26
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-26
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-10-27
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-27
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Cornelia Huck <cohuck@redhat.com> · 2021-10-28
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-29
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Yishai Hadas <yishaih@nvidia.com> · 2021-10-29
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-10-28
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-28
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Cornelia Huck <cohuck@redhat.com> · 2021-10-29
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Yishai Hadas <yishaih@nvidia.com> · 2021-10-29
RE: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Shameerali Kolothum Thodi <hidden> · 2021-10-29
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-29
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-10-29
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-11-01
RE: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Shameerali Kolothum Thodi <hidden> · 2021-11-02
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-11-02
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-11-02
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-11-02
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-11-02
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-11-02
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-11-03
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-11-03
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-11-03
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-11-03
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Cornelia Huck <cohuck@redhat.com> · 2021-11-04
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Cornelia Huck <cohuck@redhat.com> · 2021-11-05
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Cornelia Huck <cohuck@redhat.com> · 2021-11-16
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-11-05
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-11-05
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-11-15
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-11-16
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-11-16
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-11-16
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-11-17
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-11-18
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-11-22
RE: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · "Tian, Kevin" <kevin.tian@intel.com> · 2021-11-08
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-11-08
RE: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · "Tian, Kevin" <kevin.tian@intel.com> · 2021-11-09
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-11-09
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Dr. David Alan Gilbert <hidden> · 2021-10-25
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-10-25
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Dr. David Alan Gilbert <hidden> · 2021-10-25
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-25
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Dr. David Alan Gilbert <hidden> · 2021-10-26
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-26
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-10-26
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-26
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Alex Williamson <hidden> · 2021-10-26
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Dr. David Alan Gilbert <hidden> · 2021-10-26
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-26
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Yishai Hadas <yishaih@nvidia.com> · 2021-10-20
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-20
Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices · Yishai Hadas <yishaih@nvidia.com> · 2021-10-21
[PATCH V2 mlx5-next 13/14] vfio/pci: Expose vfio_pci_aer_err_detected() · Yishai Hadas <yishaih@nvidia.com> · 2021-10-19
[PATCH V2 mlx5-next 14/14] vfio/mlx5: Use its own PCI reset_done error handler · Yishai Hadas <yishaih@nvidia.com> · 2021-10-19
Re: [PATCH V2 mlx5-next 14/14] vfio/mlx5: Use its own PCI reset_done error handler · Alex Williamson <hidden> · 2021-10-19
Re: [PATCH V2 mlx5-next 14/14] vfio/mlx5: Use its own PCI reset_done error handler · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-19
Re: [PATCH V2 mlx5-next 14/14] vfio/mlx5: Use its own PCI reset_done error handler · Yishai Hadas <yishaih@nvidia.com> · 2021-10-20
Re: [PATCH V2 mlx5-next 14/14] vfio/mlx5: Use its own PCI reset_done error handler · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-20
Re: [PATCH V2 mlx5-next 14/14] vfio/mlx5: Use its own PCI reset_done error handler · Alex Williamson <hidden> · 2021-10-20
Re: [PATCH V2 mlx5-next 14/14] vfio/mlx5: Use its own PCI reset_done error handler · Jason Gunthorpe <jgg@nvidia.com> · 2021-10-20
Re: [PATCH V2 mlx5-next 14/14] vfio/mlx5: Use its own PCI reset_done error handler · Alex Williamson <hidden> · 2021-10-20
Re: [PATCH V2 mlx5-next 14/14] vfio/mlx5: Use its own PCI reset_done error handler · Yishai Hadas <yishaih@nvidia.com> · 2021-10-21
vfio migration discussions (was: [PATCH V2 mlx5-next 00/14] Add mlx5 live migration driver) · Cornelia Huck <cohuck@redhat.com> · 2021-11-17
Re: vfio migration discussions (was: [PATCH V2 mlx5-next 00/14] Add mlx5 live migration driver) · Jason Gunthorpe <jgg@nvidia.com> · 2021-11-17

From: Alex Williamson <hidden>
Date: 2021-11-02 14:57:33
Also in: kvm, linux-pci

On Mon, 1 Nov 2021 14:25:06 -0300
Jason Gunthorpe [off-list ref] wrote:

On Fri, Oct 29, 2021 at 04:06:21PM -0600, Alex Williamson wrote:

quoted

Right now we are focused on the non-P2P cases, which I think is a
reasonable starting limitation.

It's a reasonable starting point iff we know that we need to support
devices that cannot themselves support a quiescent state.  Otherwise it
would make sense to go back to work on the uAPI because I suspect the
implications to userspace are not going to be as simple as "oops, can't
migrate, there are two devices."  As you say, there's a universe of
devices that run together that don't care about p2p and QEMU will be
pressured to support migration of those configurations.

I agree with this, but I also think what I saw in the proposed hns
driver suggests it's HW cannot do quiescent, if so this is the first
counter-example to the notion it is a universal ability?

hns people: Can you put your device in a state where it is operating,
able to accept and respond to MMIO, and yet guarentees it generates no
DMA transactions?

quoted

want migration.  If we ever want both migration and p2p, QEMU would
need to reject any device that can't comply.

Yes, it looks like a complicated task on the qemu side to get this
resolved

quoted

It is not a big deal to defer things to rc1, though merging a
leaf-driver that has been on-list over a month is certainly not
rushing either.

If "on-list over a month" is meant to imply that it's well vetted, it
does not.  That's a pretty quick time frame given the uAPI viability
discussions that it's generated.

I only said rushed :)

To push forward regardless of unresolved questions is rushing
regardless of how long it's been on-list.

quoted

I'm tending to agree that there's value in moving forward, but there's
a lot we're defining here that's not in the uAPI, so I'd like to see
those things become formalized.

Ok, lets come up with a documentation patch then to define !RUNNING as
I outlined and start to come up with the allowed list of actions..

I think I would like to have a proper rst file for documenting the
uapi as well.

quoted

I think this version is defining that it's the user's responsibility to
prevent external DMA to devices while in the !_RUNNING state.  This
resolves the condition that we have no means to coordinate quiescing
multiple devices.  We shouldn't necessarily prescribe a single device
solution in the uAPI if the same can be equally achieved through
configuration of DMA mapping.

I'm not sure what this means?

I'm just trying to avoid the uAPI calling out a single-device
restriction if there are other ways that userspace can quiesce external
DMA outside of the uAPI, such as by limiting p2p DMA mappings at the
IOMMU, ie. define the userspace requirements but don't dictate a
specific solution.

quoted

I was almost on board with blocking MMIO, especially as p2p is just DMA
mapping of MMIO, but what about MSI-X?  During _RESUME we must access
the MSI-X vector table via the SET_IRQS ioctl to configure interrupts.
Is this exempt because the access occurs in the host?

s/in the host/in the kernel/ SET_IRQS is a kernel ioctl that uses the
core MSIX code to do the mmio, so it would not be impacted by MMIO
zap.

AIUI, "zap" is just the proposed userspace manifestation that the
device cannot accept MMIO writes while !_RUNNING, but these writes must
occur in that state.

Looks like you've already marked these points with the
vfio_pci_memory_lock_and_enable(), so a zap for migration would have
to be a little different than a zap for reset.

Still, this is something that needs clear definition, I would expect
the SET_IRQS to happen after resuming clears but before running sets
to give maximum HW flexibility and symmetry with saving.

There's no requirement that the device enters a null state (!_RESUMING
| !_SAVING | !_RUNNING), the uAPI even species the flows as _RESUMING
transitioning to _RUNNING.  There's no point at which we can do
SET_IRQS other than in the _RESUMING state.  Generally SET_IRQS
ioctls are coordinated with the guest driver based on actions to the
device, we can't be mucking with IRQs while the device is presumed
running and already generating interrupt conditions.

And we should really define clearly what a device is supposed to do
with the interrupt vectors during migration. Obviously there are races
here.

The device should not be generating interrupts while !_RUNNING, pending
interrupts should be held until the device is _RUNNING.  To me this
means the sequence must be that INTx/MSI/MSI-X are restored while in
the !_RUNNING state.

quoted

In any case, it requires that the device cannot be absolutely static
while !_RUNNING.  Does (_RESUMING) have different rules than
(_SAVING)?

I'd prever to avoid all device touches during both resuming and
saving, and do them during !RUNNING

There's no such state required by the uAPI.

quoted

So I'm still unclear how the uAPI needs to be updated relative to
region access.  We need that list of what the user is allowed to
access, which seems like minimally config space and MSI-X table space,
but are these implicitly known for vfio-pci devices or do we need
region flags or capabilities to describe?  We can't generally know the
disposition of device specific regions relative to this access.  Thanks,

I'd prefer to be general and have the spec forbid
everything. Specifying things like VFIO_DEVICE_SET_IRQS1 covers all the
bus types.

AFAICT, SET_IRQS while _RESUMING is a requirement, as is some degree of
access to config space.  It seems you're proposing a new required null
state which is contradictory to the existing uAPI.  Thanks,

Alex

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help