Re: [PATCH] tracing: fprobe: fix suspicious rcu usage in fprobe_entry

From: "Paul E. McKenney" <paulmck@kernel.org>
Date: 2025-09-02 11:58:57
Also in: linux-crypto, lkml

On Tue, Sep 02, 2025 at 03:59:53PM +0900, Masami Hiramatsu wrote:

On Mon, 1 Sep 2025 08:00:15 -0700
"Paul E. McKenney" [off-list ref] wrote:

quoted

On Mon, Sep 01, 2025 at 05:06:55PM +0900, Masami Hiramatsu wrote:

quoted

On Fri, 29 Aug 2025 04:11:02 -0700
"Paul E. McKenney" [off-list ref] wrote:

quoted

On Thu, Aug 28, 2025 at 10:23:57PM -0400, Steven Rostedt wrote:

quoted

On Fri, 29 Aug 2025 10:14:36 +0800
Menglong Dong [off-list ref] wrote:

quoted

rcu_read_lock() is not needed in fprobe_entry, but rcu_dereference_check()
is used in rhltable_lookup(), which causes suspicious RCU usage warning:

  WARNING: suspicious RCU usage
  6.17.0-rc1-00001-gdfe0d675df82 #1 Tainted: G S
  -----------------------------
  include/linux/rhashtable.h:602 suspicious rcu_dereference_check() usage!
  ......
  stack backtrace:
  CPU: 1 UID: 0 PID: 4652 Comm: ftracetest Tainted: G S
  Tainted: [S]=CPU_OUT_OF_SPEC, [I]=FIRMWARE_WORKAROUND
  Hardware name: Dell Inc. OptiPlex 7040/0Y7WYT, BIOS 1.1.1 10/07/2015
  Call Trace:
   <TASK>
   dump_stack_lvl+0x7c/0x90
   lockdep_rcu_suspicious+0x14f/0x1c0
   __rhashtable_lookup+0x1e0/0x260
   ? __pfx_kernel_clone+0x10/0x10
   fprobe_entry+0x9a/0x450
   ? __lock_acquire+0x6b0/0xca0
   ? find_held_lock+0x2b/0x80
   ? __pfx_fprobe_entry+0x10/0x10
   ? __pfx_kernel_clone+0x10/0x10
   ? lock_acquire+0x14c/0x2d0
   ? __might_fault+0x74/0xc0
   function_graph_enter_regs+0x2a0/0x550
   ? __do_sys_clone+0xb5/0x100
   ? __pfx_function_graph_enter_regs+0x10/0x10
   ? _copy_to_user+0x58/0x70
   ? __pfx_kernel_clone+0x10/0x10
   ? __x64_sys_rt_sigprocmask+0x114/0x180
   ? __pfx___x64_sys_rt_sigprocmask+0x10/0x10
   ? __pfx_kernel_clone+0x10/0x10
   ftrace_graph_func+0x87/0xb0

Fix this by using rcu_read_lock() for rhltable_lookup(). Alternatively, we
can use rcu_lock_acquire(&rcu_lock_map) here to obtain better performance.
However, it's not a common usage :/

So this is needed even though it's called under preempt_disable().

Paul, do we need to add an rcu_read_lock() because the code in rht
(rhashtable) requires RCU read lock?

I thought that rcu_read_lock() and preempt_disable() have been merged?

Yes, preempt_disable() does indeed start an RCU read-side critical section,
just as surely as rcu_read_lock() does.

However, this is a lockdep check inside of __rhashtable_lookup():

	rht_dereference_rcu(ht->tbl, ht)

Which is defined as:

	rcu_dereference_check(p, lockdep_rht_mutex_is_held(ht));

This is explicitly telling lockdep that rcu_read_lock() is OK and
holding ht->mutex is OK, but nothing else is.

That is similar to the kprobes, which also allows accessing in
rcu critical section or under mutex.

quoted

So an alternative way to fix this is to declare it to be a false positive,
and then avoid that false positive by adding a check that preemption
is disabled.  Adding the rhashtable maintainers for their perspective.

What about changing it alloing it with preempt disabled flag?

I am not sure that "it" that you are proposing changing.  ;-)

Sorry, Ii meant the rcu_dereference_check().

quoted

However, another option for the the above rcu_dereference_check() to
become something like this:

	rcu_dereference_check(p, lockdep_rht_mutex_is_held(ht) ||
				 rcu_read_lock_any_held());

This would be happy with any RCU reader, including rcu_read_lock(),
preempt_disable(), local_irq_disable(), local_bh_disable(), and various
handler contexts.  One downside is that this would *always* be happy in
a kernel built with CONFIG_PREEMPT_{NONE,VOLUNTARY}=y.

Ah, indeed. This means that we lose the ability to explicitly check
whether the rcu pointer is in a critical section on that kernel.

It is a usability/bug-detection design tradeoff, and as such, the RCU
user's choice.  RCU is simply an arms supplier on this one.  ;-)

quoted

If this is happening often enough, it would be easy for me to create an
rcu_dereference_all_check() that allows all forms of vanilla RCU readers
(but not, for example, SRCU readers), but with only two use cases,
it is not clear to me that this is an overall win.

OK, I think this discussion is important for the patch from Menglong [1]

[1] https://lore.kernel.org/all/20250829021436.19982-1-dongml2@chinatelecom.cn/ (local)

because this does not make an rcu critical section while using `head`
but it works because fprobe_entry() runs under preempt_disable().

Agreed, and that appears to be what initiated this dicussion.

Is it better to use `guard(rcu)()` instead of rcu_read_lock() so that
it explicitly secure the `head` usage? I just wonder if there is any
downside to extend rcu_read_lock() area (still in the same
preempt_disable section).

Another option is `scoped_guard(rcu)`, for example, as used in
ftrace_find_callable_addr() in arch/loongarch/kernel/ftrace_dyn.c.
That way you keep the RAII usability while keeping the RCU read-side
critical section small.

							Thanx, Paul

Thank you,

quoted

Or am I missing a turn in here somewhere?

							Thanx, Paul

quoted

Thank you,

quoted

							Thanx, Paul

quoted

-- Steve

quoted

Reported-by: kernel test robot <redacted>
Closes: https://lore.kernel.org/oe-lkp/202508281655.54c87330-lkp@intel.com (local)
Fixes: dfe0d675df82 ("tracing: fprobe: use rhltable for fprobe_ip_table")
Signed-off-by: Menglong Dong <redacted>
---
 kernel/trace/fprobe.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/trace/fprobe.c b/kernel/trace/fprobe.c
index fb127fa95f21..fece0f849c1c 100644
--- a/kernel/trace/fprobe.c
+++ b/kernel/trace/fprobe.c

@@ -269,7 +269,9 @@ static int fprobe_entry(struct ftrace_graph_ent *trace, struct fgraph_ops *gops,
 	if (WARN_ON_ONCE(!fregs))
 		return 0;
 
+	rcu_read_lock();
 	head = rhltable_lookup(&fprobe_ip_table, &func, fprobe_rht_params);
+	rcu_read_unlock();
 	reserved_words = 0;
 	rhl_for_each_entry_rcu(node, pos, head, hlist) {
 		if (node->addr != func)


-- 
Masami Hiramatsu (Google) [off-list ref]


-- 
Masami Hiramatsu (Google) [off-list ref]

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help