Thread (4 messages) 4 messages, 3 authors, 2022-01-28

Re: ftrace hangs waiting for rcu

From: "Paul E. McKenney" <paulmck@kernel.org>
Date: 2022-01-28 16:15:55
Also in: linux-arm-kernel, linux-s390, lkml

Possibly related (same subject, not in this thread)

On Fri, Jan 28, 2022 at 04:11:57PM +0000, Mark Rutland wrote:
On Fri, Jan 28, 2022 at 05:08:48PM +0100, Sven Schnelle wrote:
quoted
Hi Mark,

Mark Rutland [off-list ref] writes:
quoted
On arm64 I bisected this down to:

  7a30871b6a27de1a ("rcu-tasks: Introduce ->percpu_enqueue_shift for dynamic queue selection")

Which was going wrong because ilog2() rounds down, and so the shift was wrong
for any nr_cpus that was not a power-of-two. Paul had already fixed that in
rcu-next, and just sent a pull request to Linus:

  https://lore.kernel.org/lkml/20220128143251.GA2398275@paulmck-ThinkPad-P17-Gen-1/ (local)

With that applied, I no longer see these hangs.

Does your s390 test machine have a non-power-of-two nr_cpus, and does that fix
the issue for you?
We noticed the PR from Paul and are currently testing the fix. So far
it's looking good. The configuration where we have seen the hang is a
bit unusual:

- 16 physical CPUs on the kvm host
- 248 logical CPUs inside kvm
Aha! 248 is notably *NOT* a power of two, and in this case the shift would be
wrong (ilog2() would give 7, when we need a shift of 8).

So I suspect you're hitting the same issue as I was.
And apparently no one runs -next on systems having a non-power-of-two
number of CPUs.  ;-)

							Thanx, Paul
Thanks,
Mark.
quoted
- debug kernel both on the host and kvm guest

So things are likely a bit slow in the kvm guest. Interesting is that
the number of CPUs is even. But maybe RCU sees an odd number of CPUs
and gets confused before all cpus are brought up. Have to read code/test
to see whether that could be possible.

Thanks for investigating!
Sven
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help