Thread (19 messages) 19 messages, 3 authors, 2024-06-19

Re: [PATCH v7 net-next 14/15] net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.

From: Jesper Dangaard Brouer <hawk@kernel.org>
Date: 2024-06-18 08:14:54
Also in: bpf, lkml


On 18/06/2024 09.13, Sebastian Andrzej Siewior wrote:
The XDP redirect process is two staged:
- bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the
   packet and makes decisions. While doing that, the per-CPU variable
   bpf_redirect_info is used.

- Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info
   and it may also access other per-CPU variables like xskmap_flush_list.

At the very end of the NAPI callback, xdp_do_flush() is invoked which
does not access bpf_redirect_info but will touch the individual per-CPU
lists.

The per-CPU variables are only used in the NAPI callback hence disabling
bottom halves is the only protection mechanism. Users from preemptible
context (like cpu_map_kthread_run()) explicitly disable bottom halves
for protections reasons.
Without locking in local_bh_disable() on PREEMPT_RT this data structure
requires explicit locking.

PREEMPT_RT has forced-threaded interrupts enabled and every
NAPI-callback runs in a thread. If each thread has its own data
structure then locking can be avoided.

Create a struct bpf_net_context which contains struct bpf_redirect_info.
Define the variable on stack, use bpf_net_ctx_set() to save a pointer to
it, bpf_net_ctx_clear() removes it again.
The bpf_net_ctx_set() may nest. For instance a function can be used from
within NET_RX_SOFTIRQ/ net_rx_action which uses bpf_net_ctx_set() and
NET_TX_SOFTIRQ which does not. Therefore only the first invocations
updates the pointer.
Use bpf_net_ctx_get_ri() as a wrapper to retrieve the current struct
bpf_redirect_info. The returned data structure is zero initialized to
ensure nothing is leaked from stack. This is done on first usage of the
struct. bpf_net_ctx_set() sets bpf_redirect_info::kern_flags  to 0 to
note that initialisation is required. First invocation of
bpf_net_ctx_get_ri() will memset() the data structure and update
bpf_redirect_info::kern_flags.
bpf_redirect_info::nh  is excluded from memset because it is only used
once BPF_F_NEIGH is set which also sets the nh member. The kern_flags is
moved past nh to exclude it from memset.

The pointer to bpf_net_context is saved task's task_struct. Using
always the bpf_net_context approach has the advantage that there is
almost zero differences between PREEMPT_RT and non-PREEMPT_RT builds.

Cc: Alexei Starovoitov<ast@kernel.org>
Cc: Andrii Nakryiko<andrii@kernel.org>
Cc: Eduard Zingerman<eddyz87@gmail.com>
Cc: Hao Luo<redacted>
Cc: Jesper Dangaard Brouer<hawk@kernel.org>
Cc: Jiri Olsa<jolsa@kernel.org>
Cc: John Fastabend<john.fastabend@gmail.com>
Cc: KP Singh<kpsingh@kernel.org>
Cc: Martin KaFai Lau<martin.lau@linux.dev>
Cc: Song Liu<song@kernel.org>
Cc: Stanislav Fomichev<redacted>
Cc: Toke Høiland-Jørgensen<redacted>
Cc: Yonghong Song<yonghong.song@linux.dev>
Cc:bpf@vger.kernel.org
Acked-by: Alexei Starovoitov<ast@kernel.org>
Reviewed-by: Toke Høiland-Jørgensen<redacted>
Signed-off-by: Sebastian Andrzej Siewior<bigeasy@linutronix.de>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
---
  include/linux/filter.h | 56 ++++++++++++++++++++++++++++++++++--------
  include/linux/sched.h  |  3 +++
  kernel/bpf/cpumap.c    |  3 +++
  kernel/bpf/devmap.c    |  9 ++++++-
  kernel/fork.c          |  1 +
  net/bpf/test_run.c     | 11 ++++++++-
  net/core/dev.c         | 26 +++++++++++++++++++-
  net/core/filter.c      | 44 +++++++++------------------------
  net/core/lwt_bpf.c     |  3 +++
  9 files changed, 111 insertions(+), 45 deletions(-)
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help