Re: [PATCH v7 2/2] clocksource: add J-Core timer/clocksource driver

From: Rich Felker <dalias@libc.org>
Date: 2016-10-02 03:59:43
Also in: linux-sh, lkml

On Sat, Oct 01, 2016 at 08:00:49PM -0400, Rich Felker wrote:

quoted

- During the whole sequence, hrtimer expiration times are being set to
  exact jiffies (@ 100 Hz), whereas before it they're quite arbitrary.

When a CPU goes into NOHZ idle and the next (timer/hrtimer) is farther out
than the next tick, then tick_sched_timer is set to this next event which
can be far out. So that's expected.

quoted

- The CLOCK_MONOTONIC hrtimer times do not match up with the
  timestamps; they're off by about 0.087s. I assume this is just
  sched_clock vs clocksource time and not a big deal.

Yes. You can tell the tracer to use clock monotonic so then they should match.

quoted

- The rcu_sched process is sleeping with timeout=1. This seems
  odd/excessive.

Why is that odd? That's one tick, i.e. 10ms in your case. And that's not
the problem at all. The problem is your timer not firing, but the cpu is
obviously either getting out of idle and then moves the tick ahead for some
unknown reason.

And a one-jiffy timeout is in fact expected behavior when HZ=100.
You have to be running HZ=250 or better to have two-jiffy timeouts,
and HZ=500 or better for three-jiffy timeouts.

One possible theory I'm looking at is that the two cpus are both
waking up (leaving cpu_idle_poll or cpuidle_idle_call) every jiffy
with sufficient consistency that every time the rcu_gp_fqs_check_wake
loop wakes up in rcu_gp_kthread, the other cpu is in cpu_idle_loop but
outside the rcu_idle_enter/rcu_idle_exit range. Would this block
forward process? I added an LED indicator in rcu_gp_fqs_check_wake
that shows the low 2 bits of rnp->qsmask every time it's called, and
under normal operation the LEDs just flash on momentarily or just one
stays on for a few seconds then goes off. During a stall both are
stuck on. I'm still trying to make sense of the code but my impression
so far is that, on a 2-cpu machine, this is a leaf node and the 2 bits
correspond directly to cpus; is that right? If so I'm a bit confused
because I don't see how forward progress could ever happen if the cpu
on which rcu_gp_kthread is blocking forward progress of
rcu_gp_kthread.

No.  If the CPUs are entering and leaving idle, and if your timers
were waking up rcu_sched every few jiffies like it asks, then the
repeated idle entry/exit events would be noticed, courtesy of the atomic
increments of ->dynticks and the rcu_sched kthread's snapshotting and
checking of this value.

I don't see how rcu_sched could notice the changes if it's stuck in
the wait loop I think it's stuck in. There is no check of ->dynticks
in rcu_gp_fqs_check_wake. Just in case checking *gfp & RCU_GP_FLAG_FQS
accomplishes this, I updated my LED hacks to clear the LEDs in that
exit path (and killed the other place that could turn them back on
from cpu_idle_loop) but I still get 2 LEDs on for 21s followed by a
stall message.

quoted

Even if the CPUs were always non-idle on every
time force_quiescent_state() is invoked, give or take the possibility
of counter wrap -- but even on a 32-bit system, that takes awhile.

Perhaps force_quiescent_state is not getting invoked? Does that sound
plausible, and if so, how should I go about debugging it? I see it is
called from the stall reporting code, so that's presumably what's
breaking the stalls.

I can confirm that force_quiescent_state is not being called at all
except from the stall handler. Where is it supposed to be called from?
I can't find any code paths to it except the stall handler and
__call_rcu_core, but the latter seems to only be called when adding
new rcu callbacks, not as a response to a stalled rcu_sched thread.
Maybe I'm missing something but this seems like incorrect logic in the
rcu subsystem.

Rich

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help