Re: [clocksource] 8901ecc231: stress-ng.lockbus.ops_per_sec -9.5% regression
From: "Paul E. McKenney" <paulmck@kernel.org>
Date: 2021-08-06 04:15:06
Also in:
lkml, oe-lkp
On Fri, Aug 06, 2021 at 10:10:00AM +0800, Chao Gao wrote:
On Thu, Aug 05, 2021 at 08:37:27AM -0700, Paul E. McKenney wrote:quoted
On Thu, Aug 05, 2021 at 01:39:40PM +0800, Chao Gao wrote:quoted
[snip]quoted
quoted
This patch works well; no false-positive (marking TSC unstable) in a 10hr stress test.Very good, thank you! May I add your Tested-by?sure. Tested-by: Chao Gao <redacted>Very good, thank you! I will apply this on the next rebase.quoted
quoted
I expect that I will need to modify the patch a bit more to check for a system where it is -never- able to get a good fine-grained read from the clock.Agreed.quoted
And it might be that your test run ended up in that state.Not that case judging from kernel logs. Coarse-grained check happened 6475 times in 43k seconds (by grep "coarse-grained skew check" in kernel logs). So, still many checks were fine-grained.Whew! ;-) So about once per 13 clocksource watchdog checks. To Andi's point, do you have enough information in your console log to work out the longest run of course-grained clocksource checks?Yes. 5 consecutive course-grained clocksource checks. Note that considering the reinitialization after course-grained check, in my calculation, two course-grained checks are considered consecutive if they happens in 1s(+/- 0.3s).
Very good, thank you! So it seems eminently reasonable to have the clocksource watchdog complain bitterly for more than (say) 100 consecutive course-grained checks. I am thinking in terms of a separate patch for this purpose. Thoughts? Thanx, Paul