[PATCH RFC bpf-next 0/5] skb extension for BPF local storage
From: Jakub Sitnicki <jakub@cloudflare.com>
Date: 2026-02-26 21:13:06
Also in:
bpf
Previously we have attempted to allow BPF users to attach tens of bytes of arbitrary data to packets by making XDP/skb metadata area persist across netstack layers [1]. This approach turned out to be unsuccessful. It would require us to restrict the layout of skb headroom and patch call sites which modify the headroom by pushing/pulling the skb->data. As per Jakub's feedback [2] we're turning our attention to skb extensions as the new vehicle for passing BPF metadata. skb extensions avoid these problems by being a separate, opt-in side allocation that doesn't interfere with skb headroom layout. With the switch to skb extensions, we are no longer restricted by the features of XDP metadata, and hence we propose to extend the concept of BPF local storage to socket buffers - skb local storage. BPF local storage is an established pattern of attaching arbitrary data from BPF context to various common kernel entities (sk, task, cgroup, inode). It avoids some of the limitations of XDP metadata, namely: 1. Multiple users can allocate space for their data without the need to coordinate. BPF local storage solves this by allocating space for each user's BPF map and its elements separately. This matters when independent BPF programs owned by different parties (e.g. a traffic policy and an observability tooling) both need to annotate the same packets. 2. Lifetime of metadata is well-defined and can be precisely scoped. By default, skb local storage is scrubbed on clone, tunnel encap/decap, and netns crossing - matching the skb extension defaults. In later iterations we plan to let users relax these defaults through BPF map flags for packet tracking use cases (see Future Work below). However, with flexibility also come downsides: BPF local storage is not allocation-free like skb->data_meta area. Creating the storage imposes additional overhead, which translates to skb processing latency. This is especially painful considering the relatively short lifetime of sk_buff objects compared to other entities like socks. The overhead tolerance for this naive skb local storage implementation depends on the pps rate and whether skb local storage gets created for every packet or just some of them, for example, when sampling or tagging first packet in an L4 connection. Our initial rough benchmarks on a VM with kernel.bpf_stats_enabled=1 [3] show that running a tc/ingress prog that creates skb local storage and writes to it amounts to ~330 nsec of per-packet overhead. Retrieving skb local storage and reading from it in a cgroup_skb/ingress hook contributes an additional ~115 nsec. Rounding up to ~500 nsec per packet: - at 100k pps, that's 5% of the 10 usec per-packet budget, but - at 1 Mpps, that's already 50% of the budget, which is not acceptable. While definitely not suitable for high-pps flows, the naive skb local storage implementation is arguably acceptable at low rates, for example when you need to attach metadata only to the first packet of a TCP/QUIC connection or sample packets at very low rates for tracing. From this initial implementation, fit for the low-pps use cases, we would like to work towards lowering the overhead to enable use at higher packet rates as proposed in the LSF/MM/BPF topic [4]. Future work - in the next iterations on the RFC I'm planning to address: 1. skb local storage copying/uncloning when user opts in with BPF_F_CLONE, 2. opt out from scrubbing BPF local storage on tunnel decap/encap, 3. opt out from scrubbing BPF local storage on crossing netns boundary. The (2) and (3) as needed to support packet tracking use cases. With this early posting I'm looking for feedback - is this going in the direction that aligns with the maintainers' and reviewers' expectations for the intended use of skb extensions and BPF local storage? Thanks, -jkbs [1] https://lore.kernel.org/all/20260107-skb-meta-safeproof-netdevs-rx-only-v3-0-0d461c5e4764@cloudflare.com/ (local) [2] https://lore.kernel.org/all/20260108174903.59323f72@kernel.org/ (local) [3] https://github.com/jsitnicki/skb-metadata-tests/tree/main/skb-storage-bench [4] https://msgid.link/87ecmffopy.fsf@cloudflare.com Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> --- Jakub Sitnicki (5): bpf: Introduce local storage for sk_buff bpf: Allow passing kernel context pointer to kfuncs bpf: Allow access to bpf_sock_ops_kern->skb selftests/bpf: Add verifier tests for skb local storage selftests/bpf: Add functional tests for skb local storage include/linux/bpf_types.h | 3 + include/linux/skbuff.h | 3 + include/net/bpf_skb_storage.h | 21 ++ include/uapi/linux/bpf.h | 1 + kernel/bpf/syscall.c | 1 + kernel/bpf/verifier.c | 67 +++- net/Kconfig | 10 + net/core/Makefile | 1 + net/core/bpf_skb_storage.c | 264 ++++++++++++++ net/core/skbuff.c | 15 + .../testing/selftests/bpf/prog_tests/skb_storage.c | 405 +++++++++++++++++++++ tools/testing/selftests/bpf/prog_tests/verifier.c | 2 + tools/testing/selftests/bpf/progs/skb_storage.c | 312 ++++++++++++++++ .../selftests/bpf/progs/verifier_skb_storage.c | 209 +++++++++++ 14 files changed, 1313 insertions(+), 1 deletion(-)