Thread (100 messages) 100 messages, 8 authors, 2021-11-22

Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices

From: Alex Williamson <hidden>
Date: 2021-11-05 15:31:54
Also in: kvm, linux-pci

On Fri, 5 Nov 2021 10:24:04 -0300
Jason Gunthorpe [off-list ref] wrote:
On Wed, Nov 03, 2021 at 12:04:11PM -0600, Alex Williamson wrote:
quoted
We agreed that it's easier to add a feature than a restriction in a
uAPI, so how do we resolve that some future device may require a new
state in order to apply the SET_IRQS configuration?  
I would say don't support those devices. If there is even a hint that
they could maybe exist then we should fix it now. Once the uapi is set
and documented we should expect device makers to consider it when
building their devices.

As for SET_IRQs, I have been looking at making documentation and I
don't like the way the documentation has to be wrriten because of
this.

What I see as an understandable, clear, documentation is:

 - SAVING set - no device touches allowed beyond migration operations
   and reset via XX
I'd suggest defining reset via ioctl only.
   Must be set with !RUNNING
Not sure what this means.  Pre-copy requires SAVING and RUNNING
together, is this only suggesting that to get the final device state we
need to do so in a !RUNNING state?
 - RESUMING set - same as SAVING
I take it then that we're defining a new protocol if we can't do
SET_IRQS here.
 - RUNNING cleared - limited device touches in this list: SET_IRQs, XX
   config, XX.
   Device may assume no touches outside the above. (ie no MMIO)
   Implies NDMA
SET_IRQS is MMIO, is the distinction userspace vs kernel?
 - NDMA set - full device touches
   Device may not issue DMA or interrupts (??)
   Device may not dirty pages
Is this achievable?  We can't bound the time where incoming DMA is
possible, devices don't have infinite buffers.
 - RUNNING set - full functionality
 * In no state may a device generate an error TLP, device
   hang/integrity failure or kernel intergity failure, no matter
   what userspace does.
   The device is permitted to corrupt the migration/VM or SEGV
   userspace if userspace doesn't follow the rules.

(we are trying to figure out what the XX's are right now, would
appreciate any help)

This is something I think we could expect a HW engineering team to
follow and implement in devices. It doesn't complicate things.

Overall, at this moment, I would prioritize documentation clarity over
strict compatability with qemu, because people have to follow this
documentation and make their devices long into the future. If the
documentation is convoluted for compatibility reasons HW people are
more likely to get it wrong. When HW people get it wrong they are more
likely to ask for "quirks" in the uAPI to fix their mistakes.
I might still suggest a v2 migration sub-type, we'll just immediately
deprecate the original as we have no users and QEMU would modify all
support to find only the new sub-type as code is updated.  "v1" never
really materialized, but we can avoid future confusion if it's never
produced by in-tree drivers and never consume by mainstream userspace.
The pending_bytes P2P idea is also quite complicated to document as
now we have to describe an HW state not in terms of a NDMA control
bit, but in terms of a bunch of implicit operations in a protocol. Not
so nice.

So, here is what I propose. Let us work on some documentation and come
up with the sort of HW centric docs like above and we can then decide
if we want to make the qemu changes it will imply, or not. We'll
include the P2P stuff, as we see it, so it shows a whole picture.

I think that will help everyone participate fully in the discussion.
Good plan.
quoted
If we're going to move forward with the existing uAPI, then we're going
to need to start factoring compatibility into our discussions of
missing states and protocols.  For example, requiring that the device
is "quiesced" when the _RUNNING bit is cleared and "frozen" when
pending_bytes is read has certain compatibility advantages versus
defining a new state bit.   
Not entirely, to support P2P going from RESUMING directly to RUNNING
is not possible. There must be an in between state that all devices
reach before they go to RUNNING. It seems P2P cannot be bolted into
the existing qmeu flow with a kernel only change?
Perhaps, yes.
quoted
clarifications were trying for within the existing uAPI rather than
toss out new device states and protocols at every turn for the sake of
API purity.  The rate at which we're proposing new states and required
transitions without a plan for the uAPI is not where I want to be for
adding the driver that could lock us in to a supported uAPI.  Thanks,  
Well, to be fair, the other cases I suggested new stats was when you
asked about features we don't have at all today (like post-copy). I
think adding new states is a very reasonable way to approach adding
new features. As long as new features can be supported with new states
we have a forward compatability story.
That has a viable upgrade path, I'm onboard with that.  A device that
imposes it can't do SET_IRQS while RESUMING when we have no required
state in between RESUMING and RUNNING are the sorts of issues that I'm
going to get hung up on.  I take it from the above that you're building
that state transition requirement into the uAPI now.  Thanks,

Alex
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help