Re: [RFC PATCH V2 11/11] x86: tsc: avoid system instability in hibernation

From: Anchal Agarwal <hidden>
Date: 2020-01-22 20:07:54
Also in: linux-mm, linux-pm, lkml, xen-devel

On Tue, Jan 14, 2020 at 07:29:52PM +0000, Anchal Agarwal wrote:

On Tue, Jan 14, 2020 at 12:30:02AM +0100, Rafael J. Wysocki wrote:

quoted

On Mon, Jan 13, 2020 at 10:50 PM Rafael J. Wysocki [off-list ref] wrote:

quoted

On Mon, Jan 13, 2020 at 1:43 PM Peter Zijlstra [off-list ref] wrote:

quoted

On Mon, Jan 13, 2020 at 11:43:18AM +0000, Singh, Balbir wrote:

quoted

For your original comment, just wanted to clarify the following:

1. After hibernation, the machine can be resumed on a different but compatible
host (these are VM images hibernated)
2. This means the clock between host1 and host2 can/will be different

In your comments are you making the assumption that the host(s) is/are the
same? Just checking the assumptions being made and being on the same page with
them.

I would expect this to be the same problem we have as regular suspend,
after power off the TSC will have been reset, so resume will have to
somehow bridge that gap. I've no idea if/how it does that.

In general, this is done by timekeeping_resume() and the only special
thing done for the TSC appears to be the tsc_verify_tsc_adjust(true)
call in tsc_resume().

And I forgot about tsc_restore_sched_clock_state() that gets called
via restore_processor_state() on x86, before calling
timekeeping_resume().

In this case tsc_verify_tsc_adjust(true) this does nothing as
feature bit X86_FEATURE_TSC_ADJUST is not available to guest. 
I am no expert in this area, but could this be messing things up?

Thanks,
Anchal

Gentle nudge on this. I will add more data here in case that helps.

1. Before this patch, tsc is stable but hibernation does not work
100% of the time. I agree if tsc is stable it should not be marked
unstable however, in this case if I run a cpu intensive workload
in the background and trigger reboot-hibernation loop I see a 
workqueue lockup. 

2. The lockup does not hose the system completely,
the reboot-hibernation carries out and system recovers. 
However, as mentioned in the commit message system does 
become unreachable for couple of seconds.

3. Xen suspend/resume seems to save/restore time_memory area in its
xen_arch_pre_suspend and xen_arch_post_suspend. The xen clock value
is saved. xen_sched_clock_offset is set at resume time to ensure a
monotonic clock value

4. Also, the instances do not have InvariantTSC exposed. Feature bit
X86_FEATURE_TSC_ADJUST is not available to guest and xen clocksource
is used by guests.

I am not sure if something needs to be fixed on hibernate path itself
or its very much ties to time handling on xen guest hibernation

Here is a part of log from last hibernation exit to next hibernation
entry. The loop was running for a while so boot to lockup log will be
huge. I am specifically including the timestamps.

...
01h 57m 15.627s(  16ms): [    5.822701] OOM killer enabled.
01h 57m 15.627s(   0ms): [    5.824981] Restarting tasks ... done.
01h 57m 15.627s(   0ms): [    5.836397] PM: hibernation exit
01h 57m 17.636s(2009ms): [    7.844471] PM: hibernation entry
01h 57m 52.725s(35089ms): [   42.934542] BUG: workqueue lockup - pool cpus=0
node=0 flags=0x0 nice=0 stuck for 37s!
01h 57m 52.730s(   5ms): [   42.941468] Showing busy workqueues and worker
pools:
01h 57m 52.734s(   4ms): [   42.945088] workqueue events: flags=0x0
01h 57m 52.737s(   3ms): [   42.948385]   pwq 0: cpus=0 node=0 flags=0x0 nice=0
active=2/256
01h 57m 52.742s(   5ms): [   42.952838]     pending: vmstat_shepherd,
check_corruption
01h 57m 52.746s(   4ms): [   42.956927] workqueue events_power_efficient:
flags=0x80
01h 57m 52.749s(   3ms): [   42.960731]   pwq 0: cpus=0 node=0 flags=0x0 nice=0
active=4/256
01h 57m 52.754s(   5ms): [   42.964835]     pending: neigh_periodic_work,
do_cache_clean [sunrpc], neigh_periodic_work, check_lifetime
01h 57m 52.781s(  27ms): [   42.971419] workqueue mm_percpu_wq: flags=0x8
01h 57m 52.781s(   0ms): [   42.974628]   pwq 0: cpus=0 node=0 flags=0x0 nice=0
active=1/256
01h 57m 52.781s(   0ms): [   42.978901]     pending: vmstat_update
01h 57m 52.781s(   0ms): [   42.981822] workqueue ipv6_addrconf: flags=0x40008
01h 57m 52.781s(   0ms): [   42.985524]   pwq 0: cpus=0 node=0 flags=0x0 nice=0
active=1/1
01h 57m 52.781s(   0ms): [   42.989670]     pending: addrconf_verify_work [ipv6]
01h 57m 52.782s(   1ms): [   42.993282] workqueue xfs-conv/xvda1: flags=0xc
01h 57m 52.786s(   4ms): [   42.996708]   pwq 0: cpus=0 node=0 flags=0x0 nice=0
active=3/256
01h 57m 52.790s(   4ms): [   43.000954]     pending: xfs_end_io [xfs],
xfs_end_io [xfs], xfs_end_io [xfs]
01h 57m 52.795s(   5ms): [   43.005610] workqueue xfs-reclaim/xvda1: flags=0xc
01h 57m 52.798s(   3ms): [   43.008945]   pwq 0: cpus=0 node=0 flags=0x0 nice=0
active=1/256
01h 57m 52.802s(   4ms): [   43.012675]     pending: xfs_reclaim_worker [xfs]
01h 57m 52.805s(   3ms): [   43.015741] workqueue xfs-sync/xvda1: flags=0x4
01h 57m 52.808s(   3ms): [   43.018723]   pwq 0: cpus=0 node=0 flags=0x0 nice=0
active=1/256
01h 57m 52.811s(   3ms): [   43.022436]     pending: xfs_log_worker [xfs]
01h 57m 52.814s(   3ms): [   43.043519] Filesystems sync: 35.234 seconds
01h 57m 52.837s(  23ms): [   43.048133] Freezing user space processes ...
(elapsed 0.001 seconds) done.
01h 57m 52.844s(   7ms): [   43.055996] OOM killer disabled.
01h 57m 53.838s( 994ms): [   43.061512] PM: Preallocating image memory... done
(allocated 385859 pages)
01h 57m 53.843s(   5ms): [   44.054720] PM: Allocated 1543436 kbytes in 1.06
seconds (1456.07 MB/s)
01h 57m 53.861s(  18ms): [   44.060885] Freezing remaining freezable tasks ...
(elapsed 0.001 seconds) done.
01h 57m 53.861s(   0ms): [   44.069715] printk: Suspending console(s) (use
no_console_suspend to debug)
01h 57m 56.278s(2417ms): [   44.116601] Disabling non-boot CPUs ...
.....
hibernate-resume loop continues after this. As mentioned above, I loose
connectivity for a while.


Thanks,
Anchal

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help