Thread (14 messages) 14 messages, 4 authors, 2019-08-21

RE: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function

From: Michael Kelley <hidden>
Date: 2019-08-12 19:22:32
Also in: linux-arch, lkml

From: Tianyu Lan <redacted> Sent: Tuesday, July 30, 2019 6:41 AM
On Mon, Jul 29, 2019 at 8:13 PM Vitaly Kuznetsov [off-list ref] wrote:
quoted
Peter Zijlstra [off-list ref] writes:
quoted
On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote:
quoted
lantianyu1986@gmail.com writes:
quoted
From: Tianyu Lan <redacted>

Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
on x86.  But native_sched_clock() directly uses the raw TSC value, which
can be discontinuous in a Hyper-V VM.   Add the generic hv_setup_sched_clock()
to set the sched clock function appropriately.  On x86, this sets
pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
scaled and adjusted to be continuous.
Hypervisor can, in theory, disable TSC page and then we're forced to use
MSR-based clocksource but using it as sched_clock() can be very slow,
I'm afraid.

On the other hand, what we have now is probably worse: TSC can,
actually, jump backwards (e.g. on migration) and we're breaking the
requirements for sched_clock().
That (obviously) also breaks the requirements for using TSC as
clocksource.

IOW, it breaks the entire purpose of having TSC in the first place.
Currently, we mark raw TSC as unstable when running on Hyper-V (see
88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used
instead. The problem is that 'TSC page' can be disabled by the
hypervisor and in that case the only remaining clocksource is MSR-based
(slow).
Yes, that will be slow if Hyper-V doesn't expose hv tsc page and
kernel uses MSR based
clocksource. Each MSR read will trigger one VM-EXIT. This also happens on other
hypervisors (e,g, KVM doesn't expose KVM clock). Hypervisor should
take this into
account and determine which clocksource should be exposed or not.
We've confirmed with the Hyper-V team that the TSC page is always available
on Hyper-V 2016 and later, and on Hyper-V 2012 R2 when the physical
hardware presents an InvariantTSC.  But the Linux Kconfig's are set up so
the TSC page is not used for 32-bit guests -- all clock reads are synthetic MSR
reads.  For 32-bit, this set of changes will add more overhead because the
sched clock reads will now be MSR reads.

I would be inclined to fix the problem, even with the perf hit on 32-bit Linux.
I don’t have any data on 32-bit Linux being used in a Hyper-V guest, but it's not
supported in Azure so usage is pretty small.  The alternative would be to continue
to use the raw TSC value on 32-bit, even with the risk of a discontinuity in case of
live migration or similar scenarios.

Michael
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help