Re: [PATCH V2 mlx5-next 12/14] vfio/mlx5: Implement vfio_pci driver for mlx5 devices
From: Alex Williamson <hidden>
Date: 2021-11-05 15:31:54
Also in:
kvm, linux-pci
On Fri, 5 Nov 2021 10:24:04 -0300 Jason Gunthorpe [off-list ref] wrote:
On Wed, Nov 03, 2021 at 12:04:11PM -0600, Alex Williamson wrote:quoted
We agreed that it's easier to add a feature than a restriction in a uAPI, so how do we resolve that some future device may require a new state in order to apply the SET_IRQS configuration?I would say don't support those devices. If there is even a hint that they could maybe exist then we should fix it now. Once the uapi is set and documented we should expect device makers to consider it when building their devices. As for SET_IRQs, I have been looking at making documentation and I don't like the way the documentation has to be wrriten because of this. What I see as an understandable, clear, documentation is: - SAVING set - no device touches allowed beyond migration operations and reset via XX
I'd suggest defining reset via ioctl only.
Must be set with !RUNNING
Not sure what this means. Pre-copy requires SAVING and RUNNING together, is this only suggesting that to get the final device state we need to do so in a !RUNNING state?
- RESUMING set - same as SAVING
I take it then that we're defining a new protocol if we can't do SET_IRQS here.
- RUNNING cleared - limited device touches in this list: SET_IRQs, XX config, XX. Device may assume no touches outside the above. (ie no MMIO) Implies NDMA
SET_IRQS is MMIO, is the distinction userspace vs kernel?
- NDMA set - full device touches Device may not issue DMA or interrupts (??) Device may not dirty pages
Is this achievable? We can't bound the time where incoming DMA is possible, devices don't have infinite buffers.
- RUNNING set - full functionality * In no state may a device generate an error TLP, device hang/integrity failure or kernel intergity failure, no matter what userspace does. The device is permitted to corrupt the migration/VM or SEGV userspace if userspace doesn't follow the rules. (we are trying to figure out what the XX's are right now, would appreciate any help) This is something I think we could expect a HW engineering team to follow and implement in devices. It doesn't complicate things. Overall, at this moment, I would prioritize documentation clarity over strict compatability with qemu, because people have to follow this documentation and make their devices long into the future. If the documentation is convoluted for compatibility reasons HW people are more likely to get it wrong. When HW people get it wrong they are more likely to ask for "quirks" in the uAPI to fix their mistakes.
I might still suggest a v2 migration sub-type, we'll just immediately deprecate the original as we have no users and QEMU would modify all support to find only the new sub-type as code is updated. "v1" never really materialized, but we can avoid future confusion if it's never produced by in-tree drivers and never consume by mainstream userspace.
The pending_bytes P2P idea is also quite complicated to document as now we have to describe an HW state not in terms of a NDMA control bit, but in terms of a bunch of implicit operations in a protocol. Not so nice. So, here is what I propose. Let us work on some documentation and come up with the sort of HW centric docs like above and we can then decide if we want to make the qemu changes it will imply, or not. We'll include the P2P stuff, as we see it, so it shows a whole picture. I think that will help everyone participate fully in the discussion.
Good plan.
quoted
If we're going to move forward with the existing uAPI, then we're going to need to start factoring compatibility into our discussions of missing states and protocols. For example, requiring that the device is "quiesced" when the _RUNNING bit is cleared and "frozen" when pending_bytes is read has certain compatibility advantages versus defining a new state bit.Not entirely, to support P2P going from RESUMING directly to RUNNING is not possible. There must be an in between state that all devices reach before they go to RUNNING. It seems P2P cannot be bolted into the existing qmeu flow with a kernel only change?
Perhaps, yes.
quoted
clarifications were trying for within the existing uAPI rather than toss out new device states and protocols at every turn for the sake of API purity. The rate at which we're proposing new states and required transitions without a plan for the uAPI is not where I want to be for adding the driver that could lock us in to a supported uAPI. Thanks,Well, to be fair, the other cases I suggested new stats was when you asked about features we don't have at all today (like post-copy). I think adding new states is a very reasonable way to approach adding new features. As long as new features can be supported with new states we have a forward compatability story.
That has a viable upgrade path, I'm onboard with that. A device that imposes it can't do SET_IRQS while RESUMING when we have no required state in between RESUMING and RUNNING are the sorts of issues that I'm going to get hung up on. I take it from the above that you're building that state transition requirement into the uAPI now. Thanks, Alex