RE: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function
From: Michael Kelley <hidden>
Date: 2019-08-12 19:22:32
Also in:
linux-arch, lkml
From: Tianyu Lan <redacted> Sent: Tuesday, July 30, 2019 6:41 AM
On Mon, Jul 29, 2019 at 8:13 PM Vitaly Kuznetsov [off-list ref] wrote:quoted
Peter Zijlstra [off-list ref] writes:quoted
On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote:quoted
lantianyu1986@gmail.com writes:quoted
From: Tianyu Lan <redacted> Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock on x86. But native_sched_clock() directly uses the raw TSC value, which can be discontinuous in a Hyper-V VM. Add the generic hv_setup_sched_clock() to set the sched clock function appropriately. On x86, this sets pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is scaled and adjusted to be continuous.Hypervisor can, in theory, disable TSC page and then we're forced to use MSR-based clocksource but using it as sched_clock() can be very slow, I'm afraid. On the other hand, what we have now is probably worse: TSC can, actually, jump backwards (e.g. on migration) and we're breaking the requirements for sched_clock().That (obviously) also breaks the requirements for using TSC as clocksource. IOW, it breaks the entire purpose of having TSC in the first place.Currently, we mark raw TSC as unstable when running on Hyper-V (see 88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used instead. The problem is that 'TSC page' can be disabled by the hypervisor and in that case the only remaining clocksource is MSR-based (slow).Yes, that will be slow if Hyper-V doesn't expose hv tsc page and kernel uses MSR based clocksource. Each MSR read will trigger one VM-EXIT. This also happens on other hypervisors (e,g, KVM doesn't expose KVM clock). Hypervisor should take this into account and determine which clocksource should be exposed or not.
We've confirmed with the Hyper-V team that the TSC page is always available on Hyper-V 2016 and later, and on Hyper-V 2012 R2 when the physical hardware presents an InvariantTSC. But the Linux Kconfig's are set up so the TSC page is not used for 32-bit guests -- all clock reads are synthetic MSR reads. For 32-bit, this set of changes will add more overhead because the sched clock reads will now be MSR reads. I would be inclined to fix the problem, even with the perf hit on 32-bit Linux. I don’t have any data on 32-bit Linux being used in a Hyper-V guest, but it's not supported in Azure so usage is pretty small. The alternative would be to continue to use the raw TSC value on 32-bit, even with the risk of a discontinuity in case of live migration or similar scenarios. Michael