Re: [RFC PATCH V2 11/11] x86: tsc: avoid system instability in hibernation

From: Eduardo Valentin <hidden>
Date: 2020-01-10 15:36:09
Also in: linux-mm, linux-pm, lkml, xen-devel

Hey Peter,

On Wed, Jan 08, 2020 at 11:50:11AM +0100, Peter Zijlstra wrote:

On Tue, Jan 07, 2020 at 11:45:26PM +0000, Anchal Agarwal wrote:

quoted

From: Eduardo Valentin <redacted>

System instability are seen during resume from hibernation when system
is under heavy CPU load. This is due to the lack of update of sched
clock data, and the scheduler would then think that heavy CPU hog
tasks need more time in CPU, causing the system to freeze
during the unfreezing of tasks. For example, threaded irqs,
and kernel processes servicing network interface may be delayed
for several tens of seconds, causing the system to be unreachable.

quoted

The fix for this situation is to mark the sched clock as unstable
as early as possible in the resume path, leaving it unstable
for the duration of the resume process. This will force the
scheduler to attempt to align the sched clock across CPUs using
the delta with time of day, updating sched clock data. In a post
hibernation event, we can then mark the sched clock as stable
again, avoiding unnecessary syncs with time of day on systems
in which TSC is reliable.

This makes no frigging sense what so bloody ever. If the clock is
stable, we don't care about sched_clock_data. When it is stable you get
a linear function of the TSC without complicated bits on.

When it is unstable, only then do we care about the sched_clock_data.

Yeah, maybe what is not clear here is that we covering for situation
where clock stability changes over time, e.g. at regular boot clock is
stable, hibernation happens, then restore happens in a non-stable clock.

quoted

Reviewed-by: Erik Quanstrom <redacted>
Reviewed-by: Frank van der Linden <redacted>
Reviewed-by: Balbir Singh <redacted>
Reviewed-by: Munehisa Kamata <redacted>
Tested-by: Anchal Agarwal <redacted>
Signed-off-by: Eduardo Valentin <redacted>
---

NAK, the code very much relies on never getting marked stable again
after it gets set to unstable.

Well actually, at the PM_POST_HIBERNATION, we do the check and set stable if
known to be stable.

The issue only really happens during the restoration path under scheduling pressure,
which takes forever to finish, as described in the commit.

Do you see a better solution for this issue?

quoted

diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
index 1152259a4ca0..374d40e5b1a2 100644
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c

@@ -116,7 +116,7 @@ static void __scd_stamp(struct sched_clock_data *scd)
 	scd->tick_raw = sched_clock();
 }
 
-static void __set_sched_clock_stable(void)
+void set_sched_clock_stable(void)
 {
 	struct sched_clock_data *scd;

@@ -236,7 +236,7 @@ static int __init sched_clock_init_late(void)
 	smp_mb(); /* matches {set,clear}_sched_clock_stable() */
 
 	if (__sched_clock_stable_early)
-		__set_sched_clock_stable();
+		set_sched_clock_stable();
 
 	return 0;
 }

-- 
2.15.3.AMZN

-- 
All the best,
Eduardo Valentin

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help