Re: Re: [Qemu-devel] [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net
From: Cornelia Huck <cohuck@redhat.com>
Date: 2018-06-15 11:48:15
Also in:
qemu-devel
On Thu, 14 Jun 2018 18:57:11 -0700 Siwei Liu [off-list ref] wrote:
Thank you for sharing your thoughts, Cornelia. With questions below, I think you raised really good points, some of which I don't have answer yet and would also like to explore here. First off, I don't want to push the discussion to the extreme at this point, or sell anything about having QEMU manage everything automatically. Don't get me wrong, it's not there yet. Let's don't assume we are tied to a specific or concerte solution. I think the key for our discussion might be to define or refine the boundary between VM and guest, e.g. what each layer is expected to control and manage exactly. In my view, there might be possibly 3 different options to represent the failover device conceipt to QEMU and libvirt (or any upper layer software): a. Seperate device: in this model, virtio and passthough remains separate devices just as today. QEMU exposes the standby feature bit for virtio, and publish status/event around the negotiation process of this feature bit for libvirt to react upon. Since Libvirt has the pairing relationship itself, maybe through MAC address or something else, it can control the presence of primary by hot plugging or unplugging the passthrough device, although it has to work tightly with virtio's feature negotation process. Not just for migration but also various corner scenarios (driver/feature ok, device reset, reboot, legacy guest etc) along virtio's feature negotiation.
Yes, that one has obvious tie-ins to virtio's modus operandi.
b. Coupled device: in this model, virtio and passthough devices are weakly coupled using some group ID, i.e. QEMU match the passthough device for a standby virtio instance by comparing the group ID value present behind each device's bridge. Libvirt provides QEMU the group ID for both type of devices, and only deals with hot plug for migration, by checking some migration status exposed (e.g. the feature negotiation status on the virtio device) by QEMU. QEMU manages the visibility of the primary in guest along virtio's feature negotiation process.
I'm a bit confused here. What, exactly, ties the two devices together? If libvirt already has the knowledge that it should manage the two as a couple, why do we need the group id (or something else for other architectures)? (Maybe I'm simply missing something because I'm not that familiar with pci.)
c. Fully combined device: in this model, virtio and passthough devices are viewed as a single VM interface altogther. QEMU not just controls the visibility of the primary in guest, but can also manage the exposure of the passthrough for migratability. It can be like that libvirt supplies the group ID to QEMU. Or libvirt does not even have to provide group ID for grouping the two devices, if just one single combined device is exposed by QEMU. In either case, QEMU manages all aspect of such internal construct, including virtio feature negotiation, presence of the primary, and live migration.
Same question as above.
It looks like to me that, in your opinion, you seem to prefer go with (a). While I'm actually okay with either (b) or (c). Do I understand your point correctly?
I'm not yet preferring anything, as I'm still trying to understand how this works :) I hope we can arrive at a model that covers the use case and that is also flexible enough to be extended to other platforms.
The reason that I feel that (a) might not be ideal, just as Michael alluded to (quoting below), is that as management stack, it really doesn't need to care about the detailed process of feature negotiation (if we view the guest presence of the primary as part of feature negotiation at an extended level not just virtio). All it needs to be done is to hand in the required devices to QEMU and that's all. Why do we need to addd various hooks, events for whichever happens internally within the guest? '' Primary device is added with a special "primary-failover" flag. A virtual machine is then initialized with just a standby virtio device. Primary is not yet added. Later QEMU detects that guest driver device set DRIVER_OK. It then exposes the primary device to the guest, and triggers a device addition event (hot-plug event) for it. If QEMU detects guest driver removal, it initiates a hot-unplug sequence to remove the primary driver. In particular, if QEMU detects guest re-initialization (e.g. by detecting guest reset) it immediately removes the primary device. '' and, '' management just wants to give the primary to guest and later take it back, it really does not care about the details of the process, so I don't see what does pushing it up the stack buy you. So I don't think it *needs* to be done in libvirt. It probably can if you add a bunch of hooks so it knows whenever vm reboots, driver binds and unbinds from device, and can check that backup flag was set. If you are pushing for a setup like that please get a buy-in from libvirt maintainers or better write a patch. ''
This actually seems to mean the opposite to me: We need to know what the guest is doing and when, as it directly drives what we need to do with the devices. If we switch to a visibility vs a hotplug model (see the other mail), we might be able to handle that part within qemu. However, I don't see how you get around needing libvirt to actually set this up in the first place and to handle migration per se.