Thread (14 messages) 14 messages, 4 authors, 2019-08-21

RE: [PATCH 0/2] clocksource/Hyper-V: Add Hyper-V specific sched clock function

From: Michael Kelley <hidden>
Date: 2019-08-20 14:32:49
Also in: linux-arch, lkml

From: Vitaly Kuznetsov <vkuznets@redhat.com> Sent: Tuesday, August 13, 2019 1:34 AM
Michael Kelley [off-list ref] writes:
quoted
From: Tianyu Lan <redacted> Sent: Tuesday, July 30, 2019 6:41 AM
quoted
On Mon, Jul 29, 2019 at 8:13 PM Vitaly Kuznetsov [off-list ref] wrote:
quoted
Peter Zijlstra [off-list ref] writes:
quoted
On Mon, Jul 29, 2019 at 12:59:26PM +0200, Vitaly Kuznetsov wrote:
quoted
lantianyu1986@gmail.com writes:
quoted
From: Tianyu Lan <redacted>

Hyper-V guests use the default native_sched_clock() in pv_ops.time.sched_clock
on x86.  But native_sched_clock() directly uses the raw TSC value, which
can be discontinuous in a Hyper-V VM.   Add the generic hv_setup_sched_clock()
to set the sched clock function appropriately.  On x86, this sets
pv_ops.time.sched_clock to read the Hyper-V reference TSC value that is
scaled and adjusted to be continuous.
Hypervisor can, in theory, disable TSC page and then we're forced to use
MSR-based clocksource but using it as sched_clock() can be very slow,
I'm afraid.

On the other hand, what we have now is probably worse: TSC can,
actually, jump backwards (e.g. on migration) and we're breaking the
requirements for sched_clock().
That (obviously) also breaks the requirements for using TSC as
clocksource.

IOW, it breaks the entire purpose of having TSC in the first place.
Currently, we mark raw TSC as unstable when running on Hyper-V (see
88c9281a9fba6), 'TSC page' (which is TSC * scale + offset) is being used
instead. The problem is that 'TSC page' can be disabled by the
hypervisor and in that case the only remaining clocksource is MSR-based
(slow).
Yes, that will be slow if Hyper-V doesn't expose hv tsc page and
kernel uses MSR based
clocksource. Each MSR read will trigger one VM-EXIT. This also happens on other
hypervisors (e,g, KVM doesn't expose KVM clock). Hypervisor should
take this into
account and determine which clocksource should be exposed or not.
We've confirmed with the Hyper-V team that the TSC page is always available
on Hyper-V 2016 and later, and on Hyper-V 2012 R2 when the physical
hardware presents an InvariantTSC.
Currently we check that TSC page is valid on every read and it seems
this is redundant, right? It is either available on boot or not. I can
only imagine migrating a VM to a non-InvariantTSC host when Hyper-V will
likely disable the page (and we can get reenlightenment notification
then).
I think Hyper-V can have brief intervals when the TSC page is not valid, so
the code checks for the "sequence" value being zero.   Otherwise, yes, it
should always be there or not be there.  Is there some other validity
check on every read that you are thinking of?
quoted
 But the Linux Kconfig's are set up so
the TSC page is not used for 32-bit guests -- all clock reads are synthetic MSR
reads.  For 32-bit, this set of changes will add more overhead because the
sched clock reads will now be MSR reads.

I would be inclined to fix the problem, even with the perf hit on 32-bit Linux.
I don’t have any data on 32-bit Linux being used in a Hyper-V guest, but it's not
supported in Azure so usage is pretty small.  The alternative would be to continue
to use the raw TSC value on 32-bit, even with the risk of a discontinuity in case of
live migration or similar scenarios.
The issue needs fixing, I agree, however using MSR based clocksource as
sched clock may give us too big of a performance hit (not sure who cares
about 32 bit guest performance nowadays but still). What stops us from
enabling TSC page for 32 bit guests if it is available?
I talked to KY Srinivasan for any history about TSC page on 32-bit.  He said
there was no technical reason not to implement it, but our focus was always
64-bit Linux, so the 32-bit was much less important.  Also, on 32-bit Linux,
the required 64x64 multiply and shift is more complex and takes more
more cycles (compare 32-bit implementation of mul_u64_u64_shr vs.
the 64-bit implementation), so the win over a MSR read is less.  I
don't know of any actual measurements being made to compare vs.
MSR read.

Michael
--
Vitaly
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help