Thread (38 messages) 38 messages, 7 authors, 2016-08-15

Re: clocksource_watchdog causing scheduling of timers every second (was [v13] support "task_isolation" mode)

From: Paul E. McKenney <hidden>
Date: 2016-08-11 22:29:17
Also in: lkml

On Thu, Aug 11, 2016 at 10:40:02AM +0200, Peter Zijlstra wrote:
On Thu, Aug 11, 2016 at 12:16:58AM +0200, Frederic Weisbecker wrote:
quoted
I had similar issues, this seems to happen when the tsc is considered not reliable
(which doesn't necessarily mean unstable. I think it has to do with some x86 CPU feature
flag).
Right, as per the other email, in general we cannot know/assume the TSC
to be working as intended :/
quoted
IIRC, this _has_ to execute on all online CPUs because every TSCs of running CPUs
are concerned.
With modern Intel we could run it on one CPU per package I think, but at
the same time, too much in NOHZ_FULL assumes the TSC is indeed sane so
it doesn't make sense to me to keep the watchdog running, when it
triggers it would also have to kill all NOHZ_FULL stuff, which would
probably bring the entire machine down..
Well, you -could- force a very low priority CPU-bound task to run on
all nohz_full CPUs.  Not necessarily a good idea, but a relatively
non-intrusive response to that particular error condition.

							Thanx, Paul
Arguably we should issue a boot time warning if NOHZ_FULL is configured
and the TSC watchdog is running.
quoted
I personally override that with passing the tsc=reliable kernel
parameter. Of course use it at your own risk.
Yes, that is (sadly) our only option. Manually assert our hardware is
solid under the intended workload and then manually disabling the
watchdog.
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help