Re: rcu/tree: Protect rcu_rdp_is_offloaded() invocations on RT | linux-arm-kernel

(off-list ancestor, not in this archive)

On Wed, Sep 22, 2021 at 01:36:27AM +0200, Frederic Weisbecker wrote:
On Tue, Sep 21, 2021 at 11:12:50PM +0200, Thomas Gleixner wrote:
Valentin reported warnings about suspicious RCU usage on RT kernels. Those
happen when offloading of RCU callbacks is enabled:

  WARNING: suspicious RCU usage
  5.13.0-rt1 #20 Not tainted
  -----------------------------
  kernel/rcu/tree_plugin.h:69 Unsafe read of RCU_NOCB offloaded state!

  rcu_rdp_is_offloaded (kernel/rcu/tree_plugin.h:69 kernel/rcu/tree_plugin.h:58)
  rcu_core (kernel/rcu/tree.c:2332 kernel/rcu/tree.c:2398 kernel/rcu/tree.c:2777)
  rcu_cpu_kthread (./include/linux/bottom_half.h:32 kernel/rcu/tree.c:2876)

The reason is that rcu_rdp_is_offloaded() is invoked without one of the
required protections on RT enabled kernels because local_bh_disable() does
not disable preemption on RT.

Valentin proposed to add a local lock to the code in question, but that's
suboptimal in several aspects:

  1) local locks add extra code to !RT kernels for no value.

  2) All possible callsites have to audited and amended when affected
     possible at an outer function level due to lock nesting issues.

  3) As the local lock has to be taken at the outer functions it's required
     to release and reacquire them in the inner code sections which might
     voluntary schedule, e.g. rcu_do_batch().

Both callsites of rcu_rdp_is_offloaded() which trigger this check invoke
rcu_rdp_is_offloaded() in the variable declaration section right at the top
of the functions. But the actual usage of the result is either within a
section which provides the required protections or after such a section.

So the obvious solution is to move the invocation into the code sections
which provide the proper protections, which solves the problem for RT and
does not have any impact on !RT kernels.
You also need to consider rcu_segcblist_completely_offloaded(). There
are two users:

1) The first chunk using it in rcu_core() checks if there is a need to
accelerate the callback and that can happen concurrently with nocb
manipulations on the cblist. Concurrent (de-)offloading could mess
up with the locking state but here is what we can do:

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index bce848e50512..3e56a1a4af03 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2728,9 +2728,10 @@ static __latent_entropy void rcu_core(void)
 
 	/* No grace period and unregistered callbacks? */
 	if (!rcu_gp_in_progress() &&
-	    rcu_segcblist_is_enabled(&rdp->cblist) && do_batch) {
+	    rcu_segcblist_is_enabled(&rdp->cblist)) {
 		rcu_nocb_lock_irqsave(rdp, flags);
-		if (!rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL))
+		if (!rcu_segcblist_completely_offloaded(&rdp->cblist) &&
+		    !rcu_segcblist_restempty(&rdp->cblist, RCU_NEXT_READY_TAIL))
 			rcu_accelerate_cbs_unlocked(rnp, rdp);
 		rcu_nocb_unlock_irqrestore(rdp, flags);
 	}
diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index 305cf6aeb408..64d615be3346 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -449,10 +449,9 @@ static void rcu_lockdep_assert_cblist_protected(struct rcu_data *rdp);
 static void __init rcu_organize_nocb_kthreads(void);
 #define rcu_nocb_lock_irqsave(rdp, flags)				\
 do {									\
+	local_irq_save(flags);						\
 	if (!rcu_segcblist_is_offloaded(&(rdp)->cblist))		\
-		local_irq_save(flags);					\
-	else								\
-		raw_spin_lock_irqsave(&(rdp)->nocb_lock, (flags));	\
+		raw_spin_lock(&(rdp)->nocb_lock);			\
 } while (0)
 #else /* #ifdef CONFIG_RCU_NOCB_CPU */
 #define rcu_nocb_lock_irqsave(rdp, flags) local_irq_save(flags)


Doing the local_irq_save() before checking that the segcblist is offloaded
protect that state from being changed (provided we lock the local rdp). Then we
can safely manipulate cblist, whether locked or unlocked.

2) The actual call to rcu_do_batch(). If we are preempted between
rcu_segcblist_completely_offloaded() and rcu_do_batch() with a deoffload in
the middle, we miss the callback invocation. Invoking rcu_core by the end of
deoffloading process should solve that.
Maybe invoke rcu_core() at that point?  My concern is that there might
be an extended time between the missed rcu_do_batch() and the end of
the deoffloading process.

Reported-by: Valentin Schneider <redacted>
Signed-off-by: Thomas Gleixner <redacted>
---
 kernel/rcu/tree.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2278,13 +2278,13 @@ rcu_report_qs_rdp(struct rcu_data *rdp)
 {
 	unsigned long flags;
 	unsigned long mask;
-	bool needwake = false;
-	const bool offloaded = rcu_rdp_is_offloaded(rdp);
+	bool offloaded, needwake = false;
 	struct rcu_node *rnp;
 
 	WARN_ON_ONCE(rdp->cpu != smp_processor_id());
 	rnp = rdp->mynode;
 	raw_spin_lock_irqsave_rcu_node(rnp, flags);
+	offloaded = rcu_rdp_is_offloaded(rdp);
 	if (rdp->cpu_no_qs.b.norm || rdp->gp_seq != rnp->gp_seq ||
 	    rdp->gpwrap) {
BTW Paul, if we happen to switch to non-NOCB (deoffload) some time after
rcu_report_qs_rdp(), it's possible that the rcu_accelerate_cbs()
that was supposed to be handled by nocb kthreads on behalf of
rcu_core() -> rcu_report_qs_rdp() would not happen. At least not until
we invoke rcu_core() again. Not sure how much harm that could cause.
Again, can we just invoke rcu_core() as soon as this is noticed?

@@ -2446,7 +2446,7 @@ static void rcu_do_batch(struct rcu_data
 	int div;
 	bool __maybe_unused empty;
 	unsigned long flags;
-	const bool offloaded = rcu_rdp_is_offloaded(rdp);
+	bool offloaded;
 	struct rcu_head *rhp;
 	struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl);
 	long bl, count = 0;
@@ -2472,6 +2472,7 @@ static void rcu_do_batch(struct rcu_data
 	rcu_nocb_lock(rdp);
 	WARN_ON_ONCE(cpu_is_offline(smp_processor_id()));
 	pending = rcu_segcblist_n_cbs(&rdp->cblist);
+	offloaded = rcu_rdp_is_offloaded(rdp);
offloaded is also checked later in rcu_do_batch(), after irqrestore. In
fact that should even become a rcu_segcblist_completely_offloaded() check
there because if there are still pending callbacks while we are de-offloading,
rcu_core() should be invoked. Hmm but that might be a remote rcu_core...

Anyway I guess we could live with some of those races with invoking rcu core on the
target after deoffloading.

I guess I should cook a series to handle all these issues one by one, then
probably we can live without a local_lock().
And thank you very much for looking this over!  Not simple stuff.  ;-)

							Thanx, Paul

Thanks.



 	div = READ_ONCE(rcu_divisor);
 	div = div < 0 ? 7 : div > sizeof(long) * 8 - 2 ? sizeof(long) * 8 - 2 : div;
 	bl = max(rdp->blimit, pending >> div);
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help

Possibly related (same subject, not in this thread)