Thread (43 messages) 43 messages, 5 authors, 2024-07-29

Re: [PATCH] ptp: Add vDSO-style vmclock support

From: David Woodhouse <dwmw2@infradead.org>
Date: 2024-07-25 21:29:24
Also in: linux-arm-kernel, linux-rtc, lkml, qemu-devel, virtualization

On Thu, 2024-07-25 at 17:04 -0400, Michael S. Tsirkin wrote:
On Thu, Jul 25, 2024 at 10:00:24PM +0100, David Woodhouse wrote:
quoted
On Thu, 2024-07-25 at 16:50 -0400, Michael S. Tsirkin wrote:
quoted
On Thu, Jul 25, 2024 at 08:35:40PM +0100, David Woodhouse wrote:
quoted
On Thu, 2024-07-25 at 12:38 -0400, Michael S. Tsirkin wrote:
quoted
On Thu, Jul 25, 2024 at 04:18:43PM +0100, David Woodhouse wrote:
quoted
The use case isn't necessarily for all users of gettimeofday(), of
course; this is for those applications which *need* precision time.
Like distributed databases which rely on timestamps for coherency, and
users who get fined millions of dollars when LM messes up their clocks
and they put wrong timestamps on financial transactions.
I would however worry that with all this pass through,
applications have to be coded to each hypervisor or even
version of the hypervisor.
Yes, that would be a problem. Which is why I feel it's so important to
harmonise the contents of the shared memory, and I'm implementing it
both QEMU and $DAYJOB, as well as aligning with virtio-rtc.

Writing an actual spec for this would be another thing that might help.
Potentially, although working over it with our internal clock team and
with Peter on virtio-rtc has put us in good shape. I'm confident now
that we have something that's viable and extensible enough.
quoted
quoted
quoted
quoted
virtio has been developed with the painful experience that we keep
making mistakes, or coming up with new needed features,
and that maintaining forward and backward compatibility
becomes a whole lot harder than it seems in the beginning.
Yes. But as you note, this shared memory structure is a userspace ABI
all of its own, so we get to make a completely *different* kind of
mistake :)

So, something I still don't completely understand.
Can't the VDSO thing be written to by kernel?
Let's say on LM, an interrupt triggers and kernel copies
data from a specific device to the VDSO.

Is that problematic somehow? I imagine there is a race where
userspace reads vdso after lm but before kernel updated
vdso - is that the concern?
Yes.
quoted
quoted
Then can't we fix it by interrupting all CPUs right after LM?

To me that seems like a cleaner approach - we then compartmentalize
the ABI issue - kernel has its own ABI against userspace,
devices have their own ABI against kernel.
It'd mean we need a way to detect that interrupt was sent,
maybe yet another counter inside that structure.

WDYT?

By the way the same idea would work for snapshots -
some people wanted to expose that info to userspace, too.
Those people included me. I wanted to interrupt all the vCPUs, even the
ones which were in userspace at the moment of migration, and have the
kernel deal with passing it on to userspace via a different ABI.

It ends up being complex and intricate, and requiring a lot of new
kernel and userspace support. I gave up on it in the end for snapshots,
and didn't go there again for this.

By contrast, a driver which merely exposes a page of MMIO space
identified by an ACPI device (without even the in-kernel PTP support)
could probably be fewer than a hundred lines of code. In an externally-
buildable module that goes back as far as RHEL8 or even further,
allowing users to just build and use it from their application.
was there supposed to be text here, or did you just like this
so much you decided to repost my mail ;) 
Hm, weirdness. I've known Evolution get into a state where it sends
completely *empty* messages, but I've never seen it eat only my own
part before. I had definitely typed responses (along the lines of the
above) last time.

Attachments

  • smime.p7s [application/pkcs7-signature] 5965 bytes
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help