Thread (43 messages) 43 messages, 5 authors, 2024-07-29

Re: [PATCH] ptp: Add vDSO-style vmclock support

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: 2024-07-25 16:38:32
Also in: linux-arm-kernel, linux-rtc, lkml, qemu-devel, virtualization

On Thu, Jul 25, 2024 at 04:18:43PM +0100, David Woodhouse wrote:
On Thu, 2024-07-25 at 10:11 -0400, Michael S. Tsirkin wrote:
quoted
On Thu, Jul 25, 2024 at 02:50:50PM +0100, David Woodhouse wrote:
quoted
Even if the virtio-rtc specification were official today, and I was
able to expose it via PCI, I probably wouldn't do it that way. There's
just far more in virtio-rtc than we need; the simple shared memory
region is perfectly sufficient for most needs, and especially ours.
I can't stop amazon from shipping whatever in its hypervisor,
I'd just like to understand this better, if there is a use-case
not addressed here then we can change virtio to address it.

The rtc driver patch posted is 900 lines, yours is 700 lines, does not
look like a big difference.  As for using a memory region, this is
valid, but maybe rtc should be changed to do exactly that?
I'm certainly aiming for virtio-rtc to include that as an *option*,
because I think I don't think it makes sense for an RTC specification
aimed at virtual machines *not* to deal with the live migration
problem.

AFAICT the only ways to deal with the LM problem are either to make a
hypercall/virtio transaction for *every* clock read which needs to be
accurate, or expose a memory region for the guest to do it "vDSO-
style".
virtio can support the second option, we already have
VIRTIO_PCI_CAP_SHARED_MEMORY_CFG, I'd just use it.

And similarly, unless we want guest userspace to have to make a
*system* call every time, that memory region needs to be mappable all
the way to userspace.
This part is classic for pci, mapping pci bar has been well
studied.

The use case isn't necessarily for all users of gettimeofday(), of
course; this is for those applications which *need* precision time.
Like distributed databases which rely on timestamps for coherency, and
users who get fined millions of dollars when LM messes up their clocks
and they put wrong timestamps on financial transactions.
I would however worry that with all this pass through,
applications have to be coded to each hypervisor or even
version of the hypervisor.

I don't really know the use-case well enough - is sending
an interrupt to linux and having linux create a device
independent structure not workable?

quoted
E.g. we can easily add a capability describing such a region.
or put it in device config space.
I think it has to be memory, not config space. But yes.
virtio config space, which is just a region in a BAR.
But yes, maybe VIRTIO_PCI_CAP_SHARED_MEMORY_CFG is cleaner.
The intent is that my driver would be usable with the shared memory
region from a virtio-rtc device too. It'd need a tiny amount of
refactoring of the discovery code in vmclock_probe(), which I haven't
done yet as it would be premature optimisation. 
quoted
I mean yes, we can build a new transport for each specific need but in
the end we'll get a ton of interfaces with unclear compatibility
requirements.  If effort is instead spent improving common interfaces,
we get consistency and everyone benefits. That's why I'm trying to
understand the need here.
It's simplicity. Because this isn't even a "transport". It's just a
simple breadcrumb given to the guest to tell it where the information
is.
In the fullness of time assuming this becomes part of virtio-rtc too,
the fact that it can *also* be discovered by ACPI is just a tiny
detail. And it allows hypervisors to implement it a *whole* lot more
simply.

The addition of an ACPI method to enable the timekeeping does make it a
tiny bit more than a 'breadcrump', I concede — but that's still
basically trivial to implement. A whole lot simpler than a full virtio
device.
virtio has been developed with the painful experience that we keep
making mistakes, or coming up with new needed features,
and that maintaining forward and backward compatibility
becomes a whole lot harder than it seems in the beginning.

-- 
MST
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help