Thread (14 messages) 14 messages, 3 authors, 2020-11-06

Re: [PATCH] arm64/smp: Move rcu_cpu_starting() earlier

From: Qian Cai <hidden>
Date: 2020-11-06 02:15:35
Also in: lkml
Subsystem: arm64 port (aarch64 architecture), read-copy update (rcu), the rest · Maintainers: Catalin Marinas, Will Deacon, "Paul E. McKenney", Frederic Weisbecker, Neeraj Upadhyay, Joel Fernandes, Josh Triplett, Boqun Feng, Uladzislau Rezki, Linus Torvalds

On Thu, 2020-11-05 at 15:28 -0800, Paul E. McKenney wrote:
On Thu, Nov 05, 2020 at 06:02:49PM -0500, Qian Cai wrote:
quoted
On Thu, 2020-11-05 at 22:22 +0000, Will Deacon wrote:
quoted
On Fri, Oct 30, 2020 at 04:33:25PM +0000, Will Deacon wrote:
quoted
On Wed, 28 Oct 2020 14:26:14 -0400, Qian Cai wrote:
quoted
The call to rcu_cpu_starting() in secondary_start_kernel() is not
early
enough in the CPU-hotplug onlining process, which results in lockdep
splats as follows:

 WARNING: suspicious RCU usage
 -----------------------------
 kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader
section!!

[...]
Applied to arm64 (for-next/fixes), thanks!

[1/1] arm64/smp: Move rcu_cpu_starting() earlier
      https://git.kernel.org/arm64/c/ce3d31ad3cac
Hmm, this patch has caused a regression in the case that we fail to
online a CPU because it has incompatible CPU features and so we park it
in cpu_die_early(). We now get an endless spew of RCU stalls because the
core will never come online, but is being tracked by RCU. So I'm tempted
to revert this and live with the lockdep warning while we figure out a
proper fix.

What's the correct say to undo rcu_cpu_starting(), given that we cannot
invoke the full hotplug machinery here? Is it correct to call
rcutree_dying_cpu() on the bad CPU and then rcutree_dead_cpu() from the
CPU doing cpu_up(), or should we do something else?
It looks to me that rcu_report_dead() does the opposite of
rcu_cpu_starting(),
so lift rcu_report_dead() out of CONFIG_HOTPLUG_CPU and use it there to
rewind,
Paul?
Yes, rcu_report_dead() should do the trick.  Presumably the earlier
online-time CPU-hotplug notifiers are also unwound?
I don't think that is an issue here. cpu_die_early() set CPU_STUCK_IN_KERNEL,
and then __cpu_up() will see a timeout waiting for the AP online and then deal
with CPU_STUCK_IN_KERNEL according. Thus, something like this? I don't see
anything in rcu_report_dead() depends on CONFIG_HOTPLUG_CPU=y.
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 09c96f57818c..10729d2d6084 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -421,6 +421,8 @@ void cpu_die_early(void)
 
 	update_cpu_boot_status(CPU_STUCK_IN_KERNEL);
 
+	rcu_report_dead(cpu);
+
 	cpu_park_loop();
 }
 
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 2a52f42f64b6..bd04b09b84b3 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -4077,7 +4077,6 @@ void rcu_cpu_starting(unsigned int cpu)
 	smp_mb(); /* Ensure RCU read-side usage follows above initialization. */
 }
 
-#ifdef CONFIG_HOTPLUG_CPU
 /*
  * The outgoing function has no further need of RCU, so remove it from
  * the rcu_node tree's ->qsmaskinitnext bit masks.
@@ -4117,6 +4116,7 @@ void rcu_report_dead(unsigned int cpu)
 	rdp->cpu_started = false;
 }
 
+#ifdef CONFIG_HOTPLUG_CPU
 /*
  * The outgoing CPU has just passed through the dying-idle state, and we
  * are being invoked from the CPU that was IPIed to continue the offline

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help