On Thu, 9 Feb 2017 14:55:50 -0800
Andy Lutomirski [off-list ref] wrote:
On Thu, Feb 9, 2017 at 12:45 PM, KY Srinivasan [off-list ref] wrote:
quoted
quoted
-----Original Message-----
From: Thomas Gleixner [mailto:tglx@linutronix.de]
Sent: Thursday, February 9, 2017 9:08 AM
To: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: x86@kernel.org; Andy Lutomirski <luto@amacapital.net>; Ingo Molnar
[off-list ref]; H. Peter Anvin [off-list ref]; KY Srinivasan
[off-list ref]; Haiyang Zhang [off-list ref]; Stephen
Hemminger [off-list ref]; Dexuan Cui
[off-list ref]; linux-kernel@vger.kernel.org;
devel@linuxdriverproject.org; virtualization@lists.linux-foundation.org
Subject: Re: [PATCH 2/2] x86/vdso: Add VCLOCK_HVCLOCK vDSO clock read
method
On Thu, 9 Feb 2017, Vitaly Kuznetsov wrote:
quoted
+#ifdef CONFIG_HYPERV_TSCPAGE
+static notrace u64 vread_hvclock(int *mode)
+{
+ const struct ms_hyperv_tsc_page *tsc_pg =
+ (const struct ms_hyperv_tsc_page *)&hvclock_page;
+ u64 sequence, scale, offset, current_tick, cur_tsc;
+
+ while (1) {
+ sequence = READ_ONCE(tsc_pg->tsc_sequence);
+ if (!sequence)
+ break;
+
+ scale = READ_ONCE(tsc_pg->tsc_scale);
+ offset = READ_ONCE(tsc_pg->tsc_offset);
+ rdtscll(cur_tsc);
+
+ current_tick = mul_u64_u64_shr(cur_tsc, scale, 64) + offset;
+
+ if (READ_ONCE(tsc_pg->tsc_sequence) == sequence)
+ return current_tick;
That sequence stuff lacks still a sensible explanation. It's fundamentally
different from the sequence counting we do in the kernel, so documentation
for it is really required.
The host is updating multiple fields in this shared TSC page and the sequence number is
used to ensure that the guest sees a consistent set values published. If I remember
correctly, Xen has a similar mechanism.
So what's the actual protocol? When the hypervisor updates the page,
does it freeze all guest cpus? If not, how does it maintain
atomicity?
The protocol looks a lot like Linux seqlock, but it has an extra protection
which is missing here.
The host needs to update sequence number twice in order to guarantee ordering.
Otherwise it is possible that Host and guest can race.
Host
Write offset
Write scale
Set tsc_sequence = N
Guest
read sequence = N
Read scale
Write scale
Write offset
Read Offset
Check sequence == N
Set tsc_sequence = N +1
Look like the current host side protocol is wrong.
The solution that Andi Kleen invented, and I used in seqlock was for the writer to update
sequence at start and end of transaction. If sequence number is odd, then the reader knows
it is looking at stale data.
Host
Write offset
Write scale
Set tsc_sequence = N (end of transaction)
Guest
read sequence = N
Spin until sequence is even (N is even)
Read scale
Set tsc_sequence += 1
Write scale
Write offset
Read Offset
Check sequence == N? (fails is N + 1)
Set tsc_sequence += 1 (end of transaction)
read sequence = N+2
Spin until sequence is even (ie N +2)
Read scale
Read Offset
Check sequence == N +2? (yes ok).
Also it is faster to just read scale and offset with this loop and save
the reading of TSC and doing multiply until after scale/offset has been acquired.