Re: [PATCH] ptp: Add vDSO-style vmclock support
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: 2024-07-25 16:38:32
Also in:
linux-arm-kernel, linux-rtc, lkml, qemu-devel, virtualization
On Thu, Jul 25, 2024 at 04:18:43PM +0100, David Woodhouse wrote:
On Thu, 2024-07-25 at 10:11 -0400, Michael S. Tsirkin wrote:quoted
On Thu, Jul 25, 2024 at 02:50:50PM +0100, David Woodhouse wrote:quoted
Even if the virtio-rtc specification were official today, and I was able to expose it via PCI, I probably wouldn't do it that way. There's just far more in virtio-rtc than we need; the simple shared memory region is perfectly sufficient for most needs, and especially ours.I can't stop amazon from shipping whatever in its hypervisor, I'd just like to understand this better, if there is a use-case not addressed here then we can change virtio to address it. The rtc driver patch posted is 900 lines, yours is 700 lines, does not look like a big difference. As for using a memory region, this is valid, but maybe rtc should be changed to do exactly that?I'm certainly aiming for virtio-rtc to include that as an *option*, because I think I don't think it makes sense for an RTC specification aimed at virtual machines *not* to deal with the live migration problem. AFAICT the only ways to deal with the LM problem are either to make a hypercall/virtio transaction for *every* clock read which needs to be accurate, or expose a memory region for the guest to do it "vDSO- style".
virtio can support the second option, we already have VIRTIO_PCI_CAP_SHARED_MEMORY_CFG, I'd just use it.
And similarly, unless we want guest userspace to have to make a *system* call every time, that memory region needs to be mappable all the way to userspace.
This part is classic for pci, mapping pci bar has been well studied.
The use case isn't necessarily for all users of gettimeofday(), of course; this is for those applications which *need* precision time. Like distributed databases which rely on timestamps for coherency, and users who get fined millions of dollars when LM messes up their clocks and they put wrong timestamps on financial transactions.
I would however worry that with all this pass through, applications have to be coded to each hypervisor or even version of the hypervisor. I don't really know the use-case well enough - is sending an interrupt to linux and having linux create a device independent structure not workable?
quoted
E.g. we can easily add a capability describing such a region. or put it in device config space.I think it has to be memory, not config space. But yes.
virtio config space, which is just a region in a BAR. But yes, maybe VIRTIO_PCI_CAP_SHARED_MEMORY_CFG is cleaner.
The intent is that my driver would be usable with the shared memory region from a virtio-rtc device too. It'd need a tiny amount of refactoring of the discovery code in vmclock_probe(), which I haven't done yet as it would be premature optimisation.quoted
I mean yes, we can build a new transport for each specific need but in the end we'll get a ton of interfaces with unclear compatibility requirements. If effort is instead spent improving common interfaces, we get consistency and everyone benefits. That's why I'm trying to understand the need here.It's simplicity. Because this isn't even a "transport". It's just a simple breadcrumb given to the guest to tell it where the information is. In the fullness of time assuming this becomes part of virtio-rtc too, the fact that it can *also* be discovered by ACPI is just a tiny detail. And it allows hypervisors to implement it a *whole* lot more simply. The addition of an ACPI method to enable the timekeeping does make it a tiny bit more than a 'breadcrump', I concede — but that's still basically trivial to implement. A whole lot simpler than a full virtio device.
virtio has been developed with the painful experience that we keep making mistakes, or coming up with new needed features, and that maintaining forward and backward compatibility becomes a whole lot harder than it seems in the beginning. -- MST