Re: [Xen-devel] [RFC PATCH V2 11/11] x86: tsc: avoid system instability in... | netdev

Re: [Xen-devel] [RFC PATCH V2 11/11] x86: tsc: avoid system instability in hibernation

From: Singh, Balbir <hidden>
Date: 2020-01-13 15:02:30
Also in: linux-mm, linux-pm, lkml, xen-devel

On Mon, 2020-01-13 at 13:01 +0000, Andrew Cooper wrote:

On 13/01/2020 11:43, Singh, Balbir wrote:

quoted

On Mon, 2020-01-13 at 11:16 +0100, Peter Zijlstra wrote:

quoted

On Fri, Jan 10, 2020 at 07:35:20AM -0800, Eduardo Valentin wrote:

quoted

Hey Peter,

On Wed, Jan 08, 2020 at 11:50:11AM +0100, Peter Zijlstra wrote:

quoted

On Tue, Jan 07, 2020 at 11:45:26PM +0000, Anchal Agarwal wrote:

quoted

From: Eduardo Valentin <redacted>

System instability are seen during resume from hibernation when
system
is under heavy CPU load. This is due to the lack of update of
sched
clock data, and the scheduler would then think that heavy CPU hog
tasks need more time in CPU, causing the system to freeze
during the unfreezing of tasks. For example, threaded irqs,
and kernel processes servicing network interface may be delayed
for several tens of seconds, causing the system to be unreachable.
The fix for this situation is to mark the sched clock as unstable
as early as possible in the resume path, leaving it unstable
for the duration of the resume process. This will force the
scheduler to attempt to align the sched clock across CPUs using
the delta with time of day, updating sched clock data. In a post
hibernation event, we can then mark the sched clock as stable
again, avoiding unnecessary syncs with time of day on systems
in which TSC is reliable.

This makes no frigging sense what so bloody ever. If the clock is
stable, we don't care about sched_clock_data. When it is stable you
get
a linear function of the TSC without complicated bits on.

When it is unstable, only then do we care about the
sched_clock_data.

Yeah, maybe what is not clear here is that we covering for situation
where clock stability changes over time, e.g. at regular boot clock is
stable, hibernation happens, then restore happens in a non-stable
clock.

Still confused, who marks the thing unstable? The patch seems to suggest
you do yourself, but it is not at all clear why.

If TSC really is unstable, then it needs to remain unstable. If the TSC
really is stable then there is no point in marking is unstable.

Either way something is off, and you're not telling me what.

Hi, Peter

For your original comment, just wanted to clarify the following:

1. After hibernation, the machine can be resumed on a different but
compatible
host (these are VM images hibernated)
2. This means the clock between host1 and host2 can/will be different

The guests TSC value is part of all save/migrate/resume state.  Given
this bug, I presume you've actually discarded all register state on
hibernate, and the TSC is starting again from 0?

The frequency of the new TSC might very likely be different, but the
scale/offset in the paravirtual clock information should let Linux's
view of time stay consistent.

I am looking at my old dmesg logs, which I seem to have lost to revalidate,
but I think Eduardo had a different point. I should point out that I was
adding to the list of potentially missed assumptions

quoted

In your comments are you making the assumption that the host(s) is/are the
same? Just checking the assumptions being made and being on the same page
with
them.

TSCs are a massive source of "fun".  I'm not surprised that there are
yet more bugs around.

Does anyone actually know what does/should happen to the real TSC on
native S4?  The default course of action should be for virtualisation to
follow suit.

~Andrew

Balbir

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help