Re: [PATCH v3 bpf-next 03/11] bpf: tcp: Support bpf_skb_load_bytes() for BPF_SOCK_OPS_RCVQ_CB.
From: Kuniyuki Iwashima <kuniyu@google.com>
Date: 2026-05-26 21:22:10
Also in:
bpf
On Tue, May 26, 2026 at 1:34 PM Martin KaFai Lau [off-list ref] wrote:
On Sat, May 23, 2026 at 08:29:32AM +0000, Kuniyuki Iwashima wrote:quoted
When a TCP skb is queued to sk->sk_receive_queue, BPF SOCK_OPS prog can be called with BPF_SOCK_OPS_RCVQ_CB. In this hook, we want to parse the RPC descriptor in the skb and adjust sk->sk_rcvlowat based on the RPC frame size. However, we cannot access payload via bpf_sock_ops.data on modern NICs with TCP header/data split on as the payload is not placed in the linear area. Let's support bpf_skb_load_bytes() for BPF_SOCK_OPS_RCVQ_CB. Three notes: 1) bpf_sock_ops_kern.skb will be NULL when the BPF prog is invoked from recvmsg(). 2) Access to bpf_sock_ops.data will be disabled by passing 0 end_offset to bpf_skops_init_skb(). 3) ____bpf_skb_load_bytes() is called directly instead of __bpf_skb_load_bytes() to allow compilers to inline it instead of generating a tail-call.Some observations below.quoted
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> --- v2: Explain why using ____ version instead of __ --- net/core/filter.c | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+)diff --git a/net/core/filter.c b/net/core/filter.c index 4a50fe2cd863..fa8a7c7d86eb 100644 --- a/net/core/filter.c +++ b/net/core/filter.c@@ -7760,6 +7760,38 @@ static const struct bpf_func_proto bpf_sk_assign_proto = { .arg3_type = ARG_ANYTHING, }; +BPF_CALL_4(bpf_sock_ops_skb_load_bytes, struct bpf_sock_ops_kern *, bpf_sock, + u32, offset, void *, to, u32, len) +{ + int err; + + if (bpf_sock->op != BPF_SOCK_OPS_RCVQ_CB) {bpf_dynptr_from_skb() and bpf_dynptr_slice() kfunc could also be considered. One less bpf_sock->op check in filter.c to maintain and could also avoid a data copy. There is a bpf_cast_to_kern_ctx() to get to a trusted skops_kern pointer but this will need changes in verifier.c to get to skops_kern->skb (e.g. in type_is_trusted_or_null) and this is the tradeoff.
Maybe a dumb question, but does it add extra cost (extra dynptr function call?) if data overlaps two frags, or can dynptr handle it seamlessly with a single bpf_dynptr_slice() ? In our case, the data copy is ~16 bytes, so the cost will not be a big problem I think.
If this new rcvq callback is added to the 'bpf_tcp_ops' proposal [1], all this will go away. 'struct sk_buff *skb' can be directly passed to an ops of the 'bpf_tcp_ops'. Supporting '*skb' in a struct_ops has already been done in the bpf_qdisc. [1]: https://lore.kernel.org/bpf/20260519215841.2984970-11-martin.lau@linux.dev/ (local)
Oh I missed the series, the struct_ops conversion looks nice ! Since this work isn't urgent, I can wait for your series if mine churns it. Jason's series is adding a new op, and I guess this can be integrated too ? https://lore.kernel.org/bpf/20260521135244.40869-5-kerneljasonxing@gmail.com/ (local)