Re: [PATCH v2 net-next 00/15] locking: Introduce nested-BH locking.
From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Date: 2024-05-06 09:38:32
Also in:
lkml
On 2024-05-06 10:43:49 [+0200], Paolo Abeni wrote:
On Fri, 2024-05-03 at 20:25 +0200, Sebastian Andrzej Siewior wrote:quoted
Disabling bottoms halves acts as per-CPU BKL. On PREEMPT_RT code within local_bh_disable() section remains preemtible. As a result high prior tasks (or threaded interrupts) will be blocked by lower-prio task (or threaded interrupts) which are long running which includes softirq sections. The proposed way out is to introduce explicit per-CPU locks for resources which are protected by local_bh_disable() and use those only on PREEMPT_RT so there is no additional overhead for !PREEMPT_RT builds.Let me rephrase to check I understood the plan correctly. The idea is to pair 'bare' local_bh_{disable,enable} with local lock and late make local_bh_{disable,enable} no ops (on RT). With 'bare' I mean not followed by a spin_lock() - which is enough to ensure mutual exclusion vs BH on RT build - am I correct?
I might have I misunderstood your rephrase. But to make it clear: | $ git grep -p local_lock\( kernel/softirq.c | kernel/softirq.c=void __local_bh_disable_ip(unsigned long ip, unsigned int cnt) | kernel/softirq.c: local_lock(&softirq_ctrl.lock); this is what I want to remove. This is upstream RT only (not RT queue only). !RT builds are not affected by this change.
quoted
The series introduces the infrastructure and converts large parts of networking which is largest stake holder here. Once this done the per-CPU lock from local_bh_disable() on PREEMPT_RT can be lifted.AFAICS there are a bunch of local_bh_* call-sites under 'net' matching the above description and not addressed here. Is this series supposed to cover 'net' fully?
The net subsystem has not been fully audited but the major parts have been. I checked global per-CPU variables but there might be dynamic ones. Also new ones might have appeared in the meantime. There are two things which are not fixed yet that I am aware of: - tw_timer timer https://lore.kernel.org/all/20240415113436.3261042-1-vschneid@redhat.com/T/#u (local) - can gw https://lore.kernel.org/linux-can/20231031112349.y0aLoBrz@linutronix.de/ (local) https://lore.kernel.org/all/20231221123703.8170-1-socketcan@hartkopp.net/T/#u (local) That means those two need to be fixed first before that local_local() can disappear from local_bh_disable()/ enable. Also the whole tree should be checked.
Could you please include the diffstat for the whole series? I think/hope it will help catching the full picture more easily.
total over the series: | include/linux/filter.h | 134 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------- | include/linux/local_lock.h | 21 +++++++++++++++++++++ | include/linux/local_lock_internal.h | 31 +++++++++++++++++++++++++++++++ | include/linux/lockdep.h | 3 +++ | include/linux/netdevice.h | 12 ++++++++++++ | include/linux/sched.h | 9 ++++++++- | include/net/seg6_local.h | 1 + | include/net/sock.h | 5 +++++ | kernel/bpf/cpumap.c | 27 +++++++++++---------------- | kernel/bpf/devmap.c | 16 ++++++++-------- | kernel/fork.c | 3 +++ | kernel/locking/spinlock.c | 8 ++++++++ | net/bpf/test_run.c | 11 ++++++++++- | net/bridge/br_netfilter_hooks.c | 7 ++++++- | net/core/dev.c | 39 +++++++++++++++++++++++++++++++++------ | net/core/dev.h | 20 ++++++++++++++++++++ | net/core/filter.c | 107 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------------------------------- | net/core/lwt_bpf.c | 9 +++++---- | net/core/skbuff.c | 25 ++++++++++++++++--------- | net/ipv4/tcp_ipv4.c | 15 +++++++++++---- | net/ipv4/tcp_sigpool.c | 17 +++++++++++++---- | net/ipv6/seg6_local.c | 22 ++++++++++++++-------- | net/xdp/xsk.c | 19 +++++++++++-------- | 23 files changed, 445 insertions(+), 116 deletions(-)
Note that some callers use local_bh_disable(), no additional lock, and there is no specific struct to protect, but enforce explicit serialization vs bh to a bunch of operation, e.g. the local_bh_disable() in inet_twsk_purge(). I guess such call site should be handled, too?
Yes but I didn't find much. inet_twsk_purge() is the first item from my list. On RT spin_lock() vs spin_lock_bh() usage does not deadlock and could be mixed. The only resources that can be protected by disabling BH are per-CPU resources. Either explicit defined (such as napi_alloc_cache) or implicit by other means of per-CPU usage such as a CPU-bound timer, worker, …. Protecting global variables by disabling BH is broken on SMP (see the CAN gw example) so I am not too worried about those. Unless you are aware of a category I did not think of.
Thanks! Paolo
Sebastian