Thread (38 messages) 38 messages, 7 authors, 2016-08-15

Re: clocksource_watchdog causing scheduling of timers every second (was [v13] support "task_isolation" mode)

From: Frederic Weisbecker <hidden>
Date: 2016-08-11 11:59:01
Also in: lkml

On Thu, Aug 11, 2016 at 10:40:02AM +0200, Peter Zijlstra wrote:
On Thu, Aug 11, 2016 at 12:16:58AM +0200, Frederic Weisbecker wrote:
quoted
I had similar issues, this seems to happen when the tsc is considered not reliable
(which doesn't necessarily mean unstable. I think it has to do with some x86 CPU feature
flag).
Right, as per the other email, in general we cannot know/assume the TSC
to be working as intended :/
Yeah, I remember you explained me that a little while ago.
quoted
IIRC, this _has_ to execute on all online CPUs because every TSCs of running CPUs
are concerned.
With modern Intel we could run it on one CPU per package I think, but at
the same time, too much in NOHZ_FULL assumes the TSC is indeed sane so
it doesn't make sense to me to keep the watchdog running, when it
triggers it would also have to kill all NOHZ_FULL stuff, which would
probably bring the entire machine down..

Arguably we should issue a boot time warning if NOHZ_FULL is configured
and the TSC watchdog is running.
That's a very good idea! We do that when tsc is unstable but indeed we can't
seriously run NOHZ_FULL on a non-reliable tsc.

I'll take care of that warning.
quoted
I personally override that with passing the tsc=reliable kernel
parameter. Of course use it at your own risk.
Yes, that is (sadly) our only option. Manually assert our hardware is
solid under the intended workload and then manually disabling the
watchdog.
Right, I'll tell about that in the warning.

Thanks for those details!
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help