Re: [clocksource] 8901ecc231: stress-ng.lockbus.ops_per_sec -9.5% regression

From: Feng Tang <hidden>
Date: 2021-05-26 06:49:40
Also in: lkml, oe-lkp

On Sat, May 22, 2021 at 09:08:27AM -0700, Paul E. McKenney wrote:

On Fri, May 21, 2021 at 06:56:17AM -0700, Paul E. McKenney wrote:

quoted

On Fri, May 21, 2021 at 04:33:22PM +0800, kernel test robot wrote:

quoted


Greeting,

FYI, we noticed a -9.5% regression of stress-ng.lockbus.ops_per_sec due to commit:


commit: 8901ecc2315b850f35a7b8c1b73b12388b72aa78 ("clocksource: Retry clock read if long delays detected")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master


in testcase: stress-ng
on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory
with following parameters:

	nr_threads: 100%
	disk: 1HDD
	testtime: 60s
	class: memory
	test: lockbus
	cpufreq_governor: performance
	ucode: 0x5003006


please be noted below in dmesg.xz (attached)
[   28.110351]
[   28.302357] hrtimer: interrupt took 1878423 ns
[   29.690760] clocksource: timekeeping watchdog on CPU53: hpet read-back delay of 169583ns, attempt 4, marking unstable
[   29.860306] tsc: Marking TSC unstable due to clocksource watchdog
[   30.559390] TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'.
[   30.726282] sched_clock: Marking unstable (30052964508, 499342225)<-(30915547410, -363240730)
[   31.620401] clocksource: Switched to clocksource hpet

If I am reading the dmesg correctly, there were many interrupts that
prevented a good clock read.  This sound to me like a bug that the
clocksource watchdog located, but please let me know if this is not
the case.

There are also the later "perf: interrupt took too long" messages.

And of course, increasing the clocksource.max_cswd_read_retries module
boot parameter (or clocksource.max_read_retries in the earlier commits,
which I will fix) can work around short bursts of NMIs.  Or long bursts
of NMIs, if you set this kernel boot parameter large enough.

I reproduced it on a borrowed baremetal 4 nodes, 96C/192T Xeon, with
latest stress-ng code https://github.com/ColinIanKing/stress-ng.git.
(2 sockets box should also be able to reproduce it)

Seems this sub testcase 'lockbus' is a extreme stress case, by loop
doing "lock" operation:

  c8:	f0 83 00 01          	lock addl $0x1,(%rax)
  cc:	f0 83 40 04 01       	lock addl $0x1,0x4(%rax)
  d1:	f0 83 40 08 01       	lock addl $0x1,0x8(%rax)
  d6:	f0 83 40 0c 01       	lock addl $0x1,0xc(%rax)
  db:	f0 83 40 10 01       	lock addl $0x1,0x10(%rax)
  e0:	f0 83 40 14 01       	lock addl $0x1,0x14(%rax)
  e5:	f0 83 40 18 01       	lock addl $0x1,0x18(%rax)
  ea:	f0 83 40 1c 01       	lock addl $0x1,0x1c(%rax)
  ef:	f0 83 01 00          	lock addl $0x0,(%rcx)
  f3:	f0 83 01 00          	lock addl $0x0,(%rcx)
  f7:	f0 83 01 00          	lock addl $0x0,(%rcx)
  fb:	f0 83 01 00          	lock addl $0x0,(%rcx)
  ff:	f0 83 01 00          	lock addl $0x0,(%rcx)
 103:	f0 83 01 00          	lock addl $0x0,(%rcx)
 107:	f0 83 01 00          	lock addl $0x0,(%rcx)
 ...

(The source c file and objdump are attached fyi)

So the watchdog read (read_hpet() here) sometimes does take very
long time (hundreds of microseconds) which breaks this sanity
read check, and cause 'unstable' tsc.

As from the man page of stress-ng:
"Use stress-ng with caution as some of the tests can make a system
run hot on poorly designed hardware and also can cause excessive
system thrashing which may be difficult to stop"

I don't think this 'lockbus' is close to any real-world usage.

Thanks,
Feng

Attachments

stress-lockbus.c [text/x-csrc] 4989 bytes · preview
objdump_lockbus.log [text/plain] 11918 bytes · preview

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help