Re: [PATCH net-next 14/15] net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.
From: Toke Høiland-Jørgensen <hidden>
Date: 2024-05-06 19:41:44
Also in:
bpf, lkml
Sebastian Andrzej Siewior [off-list ref] writes:
The XDP redirect process is two staged: - bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the packet and makes decisions. While doing that, the per-CPU variable bpf_redirect_info is used. - Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info and it may also access other per-CPU variables like xskmap_flush_list. At the very end of the NAPI callback, xdp_do_flush() is invoked which does not access bpf_redirect_info but will touch the individual per-CPU lists. The per-CPU variables are only used in the NAPI callback hence disabling bottom halves is the only protection mechanism. Users from preemptible context (like cpu_map_kthread_run()) explicitly disable bottom halves for protections reasons. Without locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. PREEMPT_RT has forced-threaded interrupts enabled and every NAPI-callback runs in a thread. If each thread has its own data structure then locking can be avoided. Create a struct bpf_net_context which contains struct bpf_redirect_info. Define the variable on stack, use bpf_net_ctx_set() to save a pointer to it. Use the __free() annotation to automatically reset the pointer once function returns. The bpf_net_ctx_set() may nest. For instance a function can be used from within NET_RX_SOFTIRQ/ net_rx_action which uses bpf_net_ctx_set() and NET_TX_SOFTIRQ which does not. Therefore only the first invocations updates the pointer. Use bpf_net_ctx_get_ri() as a wrapper to retrieve the current struct bpf_redirect_info. On PREEMPT_RT the pointer to bpf_net_context is saved task's task_struct. On non-PREEMPT_RT builds the pointer saved in a per-CPU variable (which is always NODE-local memory). Using always the bpf_net_context approach has the advantage that there is almost zero differences between PREEMPT_RT and non-PREEMPT_RT builds.
Did you ever manage to get any performance data to see if this has an impact? [...]
+static inline struct bpf_net_context *bpf_net_ctx_get(void)
+{
+ struct bpf_net_context *bpf_net_ctx = this_cpu_read(bpf_net_context);
+
+ WARN_ON_ONCE(!bpf_net_ctx);If we have this WARN...
+static inline struct bpf_redirect_info *bpf_net_ctx_get_ri(void)
+{
+ struct bpf_net_context *bpf_net_ctx = bpf_net_ctx_get();
+
+ if (!bpf_net_ctx)
+ return NULL;... do we really need all the NULL checks? (not just here, but in the code below as well). I'm a little concerned that we are introducing a bunch of new branches in the XDP hot path. Which is also why I'm asking for benchmarks :) [...]
+ /* ri->map is assigned in __bpf_xdp_redirect_map() from within a eBPF + * program/ during NAPI callback. It is used during + * xdp_do_generic_redirect_map()/ __xdp_do_redirect_frame() from the + * redirect callback afterwards. ri->map is cleared after usage. + * The path has no explicit RCU read section but the local_bh_disable() + * is also a RCU read section which makes the complete softirq callback + * RCU protected. This in turn makes ri->map RCU protocted and it is
s/protocted/protected/
+ * sufficient to wait a grace period to ensure that no "ri->map == map" + * exist. dev_map_free() removes the map from the list and then
s/exist/exists/ -Toke