Thread (100 messages) 100 messages, 8 authors, 2021-11-22

RE: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices

From: "Tian, Kevin" <kevin.tian@intel.com>
Date: 2021-11-08 08:53:28
Also in: kvm, linux-pci

From: Jason Gunthorpe <jgg@nvidia.com>
Sent: Tuesday, October 26, 2021 11:19 PM

On Tue, Oct 26, 2021 at 08:42:12AM -0600, Alex Williamson wrote:
quoted
quoted
This is also why I don't like it being so transparent as it is
something userspace needs to care about - especially if the HW cannot
support such a thing, if we intend to allow that.
Userspace does need to care, but userspace's concern over this should
not be able to compromise the platform and therefore making VF
assignment more susceptible to fatal error conditions to comply with a
migration uAPI is troublesome for me.
It is an interesting scenario.

I think it points that we are not implementing this fully properly.

The !RUNNING state should be like your reset efforts.

All access to the MMIO memories from userspace should be revoked
during !RUNNING
This assumes that vCPUs must be stopped before !RUNNING is entered 
in virtualization case. and it is true today.

But it may not hold when talking about guest SVA and I/O page fault [1].
The problem is that the pending requests may trigger I/O page faults
on guest page tables. W/o running vCPUs to handle those faults, the
quiesce command cannot complete draining the pending requests
if the device doesn't support preempt-on-fault (at least it's the case for
some Intel and Huawei devices, possibly true for most initial SVA
implementations). 

Of course migrating guest SVA requires more changes as discussed in [1]. 
Here just want to point out this forward-looking requirement so any 
definition change in this thread won't break that usage.

[1] https://lore.kernel.org/qemu-devel/06cb5bfd-f6f8-b61b-1a7e-60a9ae2f8fac@nvidia.com/T/ (local)
(p.s. 'stop device' in [1] means 'quiesce device' in this thread)

Thanks,
Kevin
All VMAs zap'd.

All IOMMU peer mappings invalidated.

The kernel should directly block userspace from causing a MMIO TLP
before the device driver goes to !RUNNING.

Then the question of what the device does at this edge is not
relevant as hostile userspace cannot trigger it.

The logical way to implement this is to key off running and
block/unblock MMIO access when !RUNNING.

To me this strongly suggests that the extra bit is the correct way
forward as the driver is much simpler to implement and understand if
RUNNING directly controls the availability of MMIO instead of having
an irregular case where !RUNNING still allows MMIO but only until a
pending_bytes read.

Given the complexity of this can we move ahead with the current
mlx5_vfio and Yishai&co can come with some followup proposal to split
the freeze/queice and block MMIO?

Jason
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help