Re: clocksource_watchdog causing scheduling of timers every second (was [v13] support "task_isolation" mode)
From: Frederic Weisbecker <hidden>
Date: 2016-08-11 11:59:01
Also in:
lkml
On Thu, Aug 11, 2016 at 10:40:02AM +0200, Peter Zijlstra wrote:
On Thu, Aug 11, 2016 at 12:16:58AM +0200, Frederic Weisbecker wrote:quoted
I had similar issues, this seems to happen when the tsc is considered not reliable (which doesn't necessarily mean unstable. I think it has to do with some x86 CPU feature flag).Right, as per the other email, in general we cannot know/assume the TSC to be working as intended :/
Yeah, I remember you explained me that a little while ago.
quoted
IIRC, this _has_ to execute on all online CPUs because every TSCs of running CPUs are concerned.With modern Intel we could run it on one CPU per package I think, but at the same time, too much in NOHZ_FULL assumes the TSC is indeed sane so it doesn't make sense to me to keep the watchdog running, when it triggers it would also have to kill all NOHZ_FULL stuff, which would probably bring the entire machine down.. Arguably we should issue a boot time warning if NOHZ_FULL is configured and the TSC watchdog is running.
That's a very good idea! We do that when tsc is unstable but indeed we can't seriously run NOHZ_FULL on a non-reliable tsc. I'll take care of that warning.
quoted
I personally override that with passing the tsc=reliable kernel parameter. Of course use it at your own risk.Yes, that is (sadly) our only option. Manually assert our hardware is solid under the intended workload and then manually disabling the watchdog.
Right, I'll tell about that in the warning. Thanks for those details!