Re: [PATCH v7 2/2] clocksource: add J-Core timer/clocksource driver
From: Rich Felker <dalias@libc.org>
Date: 2016-10-02 03:59:43
Also in:
linux-sh, lkml
On Sat, Oct 01, 2016 at 08:00:49PM -0400, Rich Felker wrote:
quoted
quoted
quoted
quoted
quoted
- During the whole sequence, hrtimer expiration times are being set to exact jiffies (@ 100 Hz), whereas before it they're quite arbitrary.When a CPU goes into NOHZ idle and the next (timer/hrtimer) is farther out than the next tick, then tick_sched_timer is set to this next event which can be far out. So that's expected.quoted
- The CLOCK_MONOTONIC hrtimer times do not match up with the timestamps; they're off by about 0.087s. I assume this is just sched_clock vs clocksource time and not a big deal.Yes. You can tell the tracer to use clock monotonic so then they should match.quoted
- The rcu_sched process is sleeping with timeout=1. This seems odd/excessive.Why is that odd? That's one tick, i.e. 10ms in your case. And that's not the problem at all. The problem is your timer not firing, but the cpu is obviously either getting out of idle and then moves the tick ahead for some unknown reason.And a one-jiffy timeout is in fact expected behavior when HZ=100. You have to be running HZ=250 or better to have two-jiffy timeouts, and HZ=500 or better for three-jiffy timeouts.One possible theory I'm looking at is that the two cpus are both waking up (leaving cpu_idle_poll or cpuidle_idle_call) every jiffy with sufficient consistency that every time the rcu_gp_fqs_check_wake loop wakes up in rcu_gp_kthread, the other cpu is in cpu_idle_loop but outside the rcu_idle_enter/rcu_idle_exit range. Would this block forward process? I added an LED indicator in rcu_gp_fqs_check_wake that shows the low 2 bits of rnp->qsmask every time it's called, and under normal operation the LEDs just flash on momentarily or just one stays on for a few seconds then goes off. During a stall both are stuck on. I'm still trying to make sense of the code but my impression so far is that, on a 2-cpu machine, this is a leaf node and the 2 bits correspond directly to cpus; is that right? If so I'm a bit confused because I don't see how forward progress could ever happen if the cpu on which rcu_gp_kthread is blocking forward progress of rcu_gp_kthread.No. If the CPUs are entering and leaving idle, and if your timers were waking up rcu_sched every few jiffies like it asks, then the repeated idle entry/exit events would be noticed, courtesy of the atomic increments of ->dynticks and the rcu_sched kthread's snapshotting and checking of this value.I don't see how rcu_sched could notice the changes if it's stuck in the wait loop I think it's stuck in. There is no check of ->dynticks in rcu_gp_fqs_check_wake. Just in case checking *gfp & RCU_GP_FLAG_FQS accomplishes this, I updated my LED hacks to clear the LEDs in that exit path (and killed the other place that could turn them back on from cpu_idle_loop) but I still get 2 LEDs on for 21s followed by a stall message.quoted
Even if the CPUs were always non-idle on every time force_quiescent_state() is invoked, give or take the possibility of counter wrap -- but even on a 32-bit system, that takes awhile.Perhaps force_quiescent_state is not getting invoked? Does that sound plausible, and if so, how should I go about debugging it? I see it is called from the stall reporting code, so that's presumably what's breaking the stalls.
I can confirm that force_quiescent_state is not being called at all except from the stall handler. Where is it supposed to be called from? I can't find any code paths to it except the stall handler and __call_rcu_core, but the latter seems to only be called when adding new rcu callbacks, not as a response to a stalled rcu_sched thread. Maybe I'm missing something but this seems like incorrect logic in the rcu subsystem. Rich