RE: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices
From: Shameerali Kolothum Thodi <hidden>
Date: 2021-11-02 11:19:30
Also in:
kvm, linux-pci
-----Original Message----- From: Jason Gunthorpe [mailto:jgg@nvidia.com] Sent: 01 November 2021 17:25 To: Alex Williamson <redacted>; Shameerali Kolothum Thodi [off-list ref] Cc: Cornelia Huck <cohuck@redhat.com>; Yishai Hadas <yishaih@nvidia.com>; bhelgaas@google.com; saeedm@nvidia.com; linux-pci@vger.kernel.org; kvm@vger.kernel.org; netdev@vger.kernel.org; kuba@kernel.org; leonro@nvidia.com; kwankhede@nvidia.com; mgurtovoy@nvidia.com; maorg@nvidia.com; Dr. David Alan Gilbert [off-list ref] Subject: Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices On Fri, Oct 29, 2021 at 04:06:21PM -0600, Alex Williamson wrote:quoted
quoted
Right now we are focused on the non-P2P cases, which I think is a reasonable starting limitation.It's a reasonable starting point iff we know that we need to support devices that cannot themselves support a quiescent state. Otherwise it would make sense to go back to work on the uAPI because I suspect the implications to userspace are not going to be as simple as "oops, can't migrate, there are two devices." As you say, there's a universe of devices that run together that don't care about p2p and QEMU will be pressured to support migration of those configurations.I agree with this, but I also think what I saw in the proposed hns driver suggests it's HW cannot do quiescent, if so this is the first counter-example to the notion it is a universal ability? hns people: Can you put your device in a state where it is operating, able to accept and respond to MMIO, and yet guarentees it generates no DMA transactions?
AFAIK, I am afraid we cannot guarantee that as per our current implementation. At present in !RUNNING state we are putting the device in to a PAUSE state so it will complete the current request and keep the remaining ones in queue. But it can still receive a new request which will trigger the PAUSE state exit and resume the operation. So I guess, it is possible to corrupt the migration if user space misbehaves. Thanks, Shameer