Re: [PATCH bpf-next 02/17] bpf: allow RCU-protected lookups to happen from bh context
From: Daniel Borkmann <daniel@iogearbox.net>
Date: 2021-06-10 21:24:25
Also in:
netdev
Hi Paul, On 6/10/21 8:38 PM, Alexei Starovoitov wrote:
On Wed, Jun 9, 2021 at 7:24 AM Toke Høiland-Jørgensen [off-list ref] wrote:quoted
XDP programs are called from a NAPI poll context, which means the RCU reference liveness is ensured by local_bh_disable(). Add rcu_read_lock_bh_held() as a condition to the RCU checks for map lookups so lockdep understands that the dereferences are safe from inside *either* an rcu_read_lock() section *or* a local_bh_disable() section. This is done in preparation for removing the redundant rcu_read_lock()s from the drivers. Signed-off-by: Toke Høiland-Jørgensen <redacted> --- kernel/bpf/hashtab.c | 21 ++++++++++++++------- kernel/bpf/helpers.c | 6 +++--- kernel/bpf/lpm_trie.c | 6 ++++-- 3 files changed, 21 insertions(+), 12 deletions(-)diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c index 6f6681b07364..72c58cc516a3 100644 --- a/kernel/bpf/hashtab.c +++ b/kernel/bpf/hashtab.c@@ -596,7 +596,8 @@ static void *__htab_map_lookup_elem(struct bpf_map *map, void *key) struct htab_elem *l; u32 hash, key_size; - WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held()); + WARN_ON_ONCE(!rcu_read_lock_held() && !rcu_read_lock_trace_held() && + !rcu_read_lock_bh_held());It's not clear to me whether rcu_read_lock_held() is still needed. All comments sound like rcu_read_lock_bh_held() is a superset of rcu that includes bh. But reading rcu source code it looks like RCU_BH is its own rcu flavor... which is confusing.
The series is a bit confusing to me as well. I recall we had a discussion with Paul, but it was back in 2016 aka very early days of XDP to get some clarifications about RCU vs RCU-bh flavour on this. Paul, given the series in here, I assume the below is not true anymore, and in this case (since we're removing rcu_read_lock() from drivers), the RCU-bh acts as a real superset? Back then from your clarifications this was not the case: On Mon, Jul 25, 2016 at 11:26:02AM -0700, Alexei Starovoitov wrote: > On Mon, Jul 25, 2016 at 11:03 AM, Paul E. McKenney > [off-list ref] wrote: [...] >>> The crux of the question is whether a particular driver rx handler, when >>> called from __do_softirq, needs to add an additional rcu_read_lock or >>> whether it can rely on the mechanics of softirq. >> >> If it was rcu_read_lock_bh(), you could. >> >> But you didn't say rcu_read_lock_bh(), you instead said rcu_read_lock(), >> which means that you absolutely cannot rely on softirq semantics. >> >> In particular, in CONFIG_PREEMPT=y kernels, rcu_preempt_check_callbacks() >> will notice that there is no rcu_read_lock() in effect and report >> a quiescent state for that CPU. Because rcu_preempt_check_callbacks() >> is invoked from the scheduling-clock interrupt, it absolutely can >> execute during do_softirq(), and therefore being in softirq context >> in no way provides rcu_read_lock()-style protection. >> >> Now, Alexei's question was for CONFIG_PREEMPT=n kernels. However, in >> that case, rcu_read_lock() and rcu_read_unlock() generate no code >> in recent production kernels, so there is no performance penalty for >> using them. (In older kernels, they implied a barrier().) >> >> So either way, with or without CONFIG_PREEMPT, you should use >> rcu_read_lock() to get RCU protection. >> >> One alternative might be to switch to rcu_read_lock_bh(), but that >> will add local_disable_bh() overhead to your read paths. >> >> Does that help, or am I missing the point of the question? > > thanks a lot for explanation. Glad you liked it! > I mistakenly assumed that _bh variants are 'stronger' and > act as inclusive, but sounds like they're completely orthogonal > especially with preempt_rcu=y. Yes, they are pretty much orthogonal. > With preempt_rcu=n and preempt=y, it would be the case, since > bh disables preemption and rcu_read_lock does the same as well, > right? Of course, the code shouldn't be relying on that, so we > have to fix our stuff. Indeed, especially given that the kernel currently won't allow you to configure CONFIG_PREEMPT_RCU=n and CONFIG_PREEMPT=y. If it does, please let me know, as that would be a bug that needs to be fixed. (For one thing, I do not test that combination.) Thanx, Paul And now, fast-forward again to 2021 ... :) Thanks, Daniel