Thread (24 messages) 24 messages, 6 authors, 2021-08-06

Re: [clocksource] 8901ecc231: stress-ng.lockbus.ops_per_sec -9.5% regression

From: "Paul E. McKenney" <paulmck@kernel.org>
Date: 2021-08-05 15:37:31
Also in: lkml, oe-lkp

On Thu, Aug 05, 2021 at 01:39:40PM +0800, Chao Gao wrote:
[snip]
quoted
quoted
This patch works well; no false-positive (marking TSC unstable) in a
10hr stress test.
Very good, thank you!  May I add your Tested-by?
sure.
Tested-by: Chao Gao <redacted>
Very good, thank you!  I will apply this on the next rebase.
quoted
I expect that I will need to modify the patch a bit more to check for
a system where it is -never- able to get a good fine-grained read from
the clock.
Agreed.
quoted
And it might be that your test run ended up in that state.
Not that case judging from kernel logs. Coarse-grained check happened 6475
times in 43k seconds (by grep "coarse-grained skew check" in kernel logs).
So, still many checks were fine-grained.
Whew!  ;-)

So about once per 13 clocksource watchdog checks.

To Andi's point, do you have enough information in your console log to
work out the longest run of course-grained clocksource checks?
quoted
My current thought is that if more than (say) 100 consecutive attempts
to read the clocksource get hit with excessive delays, it is time to at
least do a WARN_ON(), and maybe also time to disable the clocksource
due to skew.  The reason is that if reading the clocksource -always-
sees excessive delays, perhaps the clock driver or hardware is to blame.

Thoughts?
It makes sense to me.
Sounds good!

							Thanx, Paul
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help