Thread (38 messages) 38 messages, 5 authors, 2024-05-24

Re: [PATCH net-next 14/15] net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.

From: Alexei Starovoitov <hidden>
Date: 2024-05-06 23:10:01
Also in: bpf, lkml

On Mon, May 6, 2024 at 12:41 PM Toke Høiland-Jørgensen [off-list ref] wrote:
Sebastian Andrzej Siewior [off-list ref] writes:
quoted
The XDP redirect process is two staged:
- bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the
  packet and makes decisions. While doing that, the per-CPU variable
  bpf_redirect_info is used.

- Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info
  and it may also access other per-CPU variables like xskmap_flush_list.

At the very end of the NAPI callback, xdp_do_flush() is invoked which
does not access bpf_redirect_info but will touch the individual per-CPU
lists.

The per-CPU variables are only used in the NAPI callback hence disabling
bottom halves is the only protection mechanism. Users from preemptible
context (like cpu_map_kthread_run()) explicitly disable bottom halves
for protections reasons.
Without locking in local_bh_disable() on PREEMPT_RT this data structure
requires explicit locking.

PREEMPT_RT has forced-threaded interrupts enabled and every
NAPI-callback runs in a thread. If each thread has its own data
structure then locking can be avoided.

Create a struct bpf_net_context which contains struct bpf_redirect_info.
Define the variable on stack, use bpf_net_ctx_set() to save a pointer to
it. Use the __free() annotation to automatically reset the pointer once
function returns.
The bpf_net_ctx_set() may nest. For instance a function can be used from
within NET_RX_SOFTIRQ/ net_rx_action which uses bpf_net_ctx_set() and
NET_TX_SOFTIRQ which does not. Therefore only the first invocations
updates the pointer.
Use bpf_net_ctx_get_ri() as a wrapper to retrieve the current struct
bpf_redirect_info.

On PREEMPT_RT the pointer to bpf_net_context is saved task's
task_struct. On non-PREEMPT_RT builds the pointer saved in a per-CPU
variable (which is always NODE-local memory). Using always the
bpf_net_context approach has the advantage that there is almost zero
differences between PREEMPT_RT and non-PREEMPT_RT builds.
Did you ever manage to get any performance data to see if this has an
impact?

[...]
quoted
+static inline struct bpf_net_context *bpf_net_ctx_get(void)
+{
+     struct bpf_net_context *bpf_net_ctx = this_cpu_read(bpf_net_context);
+
+     WARN_ON_ONCE(!bpf_net_ctx);
If we have this WARN...
quoted
+static inline struct bpf_redirect_info *bpf_net_ctx_get_ri(void)
+{
+     struct bpf_net_context *bpf_net_ctx = bpf_net_ctx_get();
+
+     if (!bpf_net_ctx)
+             return NULL;
... do we really need all the NULL checks?
Indeed.
Let's drop all NULL checks, since they definitely add overhead.
I'd also remove ifdef CONFIG_PREEMPT_RT and converge on single implementation:
static inline struct bpf_net_context * bpf_net_ctx_get(void)
{
 return current->bpf_net_context;
}
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help