Re: [clocksource] 8901ecc231: stress-ng.lockbus.ops_per_sec -9.5% regression
From: "Paul E. McKenney" <paulmck@kernel.org>
Date: 2021-08-05 15:37:31
Also in:
lkml, oe-lkp
From: "Paul E. McKenney" <paulmck@kernel.org>
Date: 2021-08-05 15:37:31
Also in:
lkml, oe-lkp
On Thu, Aug 05, 2021 at 01:39:40PM +0800, Chao Gao wrote:
[snip]quoted
quoted
This patch works well; no false-positive (marking TSC unstable) in a 10hr stress test.Very good, thank you! May I add your Tested-by?sure. Tested-by: Chao Gao <redacted>
Very good, thank you! I will apply this on the next rebase.
quoted
I expect that I will need to modify the patch a bit more to check for a system where it is -never- able to get a good fine-grained read from the clock.Agreed.quoted
And it might be that your test run ended up in that state.Not that case judging from kernel logs. Coarse-grained check happened 6475 times in 43k seconds (by grep "coarse-grained skew check" in kernel logs). So, still many checks were fine-grained.
Whew! ;-) So about once per 13 clocksource watchdog checks. To Andi's point, do you have enough information in your console log to work out the longest run of course-grained clocksource checks?
quoted
My current thought is that if more than (say) 100 consecutive attempts to read the clocksource get hit with excessive delays, it is time to at least do a WARN_ON(), and maybe also time to disable the clocksource due to skew. The reason is that if reading the clocksource -always- sees excessive delays, perhaps the clock driver or hardware is to blame. Thoughts?It makes sense to me.
Sounds good! Thanx, Paul