Re: [PATCH v3 bpf-next 03/11] bpf: tcp: Support bpf_skb_load_bytes() for BPF_SOCK_OPS_RCVQ_CB.
From: Jason Xing <hidden>
Date: 2026-05-27 04:01:52
Also in:
bpf
On Wed, May 27, 2026 at 6:19 AM Martin KaFai Lau [off-list ref] wrote:
On Tue, May 26, 2026 at 02:21:56PM -0700, Kuniyuki Iwashima wrote:quoted
On Tue, May 26, 2026 at 1:34 PM Martin KaFai Lau [off-list ref] wrote:quoted
On Sat, May 23, 2026 at 08:29:32AM +0000, Kuniyuki Iwashima wrote:quoted
When a TCP skb is queued to sk->sk_receive_queue, BPF SOCK_OPS prog can be called with BPF_SOCK_OPS_RCVQ_CB. In this hook, we want to parse the RPC descriptor in the skb and adjust sk->sk_rcvlowat based on the RPC frame size. However, we cannot access payload via bpf_sock_ops.data on modern NICs with TCP header/data split on as the payload is not placed in the linear area. Let's support bpf_skb_load_bytes() for BPF_SOCK_OPS_RCVQ_CB. Three notes: 1) bpf_sock_ops_kern.skb will be NULL when the BPF prog is invoked from recvmsg(). 2) Access to bpf_sock_ops.data will be disabled by passing 0 end_offset to bpf_skops_init_skb(). 3) ____bpf_skb_load_bytes() is called directly instead of __bpf_skb_load_bytes() to allow compilers to inline it instead of generating a tail-call.Some observations below.quoted
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> --- v2: Explain why using ____ version instead of __ --- net/core/filter.c | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+)diff --git a/net/core/filter.c b/net/core/filter.c index 4a50fe2cd863..fa8a7c7d86eb 100644 --- a/net/core/filter.c +++ b/net/core/filter.c@@ -7760,6 +7760,38 @@ static const struct bpf_func_proto bpf_sk_assign_proto = { .arg3_type = ARG_ANYTHING, }; +BPF_CALL_4(bpf_sock_ops_skb_load_bytes, struct bpf_sock_ops_kern *, bpf_sock, + u32, offset, void *, to, u32, len) +{ + int err; + + if (bpf_sock->op != BPF_SOCK_OPS_RCVQ_CB) {bpf_dynptr_from_skb() and bpf_dynptr_slice() kfunc could also be considered. One less bpf_sock->op check in filter.c to maintain and could also avoid a data copy. There is a bpf_cast_to_kern_ctx() to get to a trusted skops_kern pointer but this will need changes in verifier.c to get to skops_kern->skb (e.g. in type_is_trusted_or_null) and this is the tradeoff.Maybe a dumb question, but does it add extra cost (extra dynptr function call?) if data overlaps two frags, or can dynptr handle it seamlessly with a single bpf_dynptr_slice() ?Right, there is an extra bpf_dynptr_from_skb(). I don't think we have benchmarked it. If I read it correctly, unlike bpf_xdp_pointer, the skb_header_pointer will still copy even if the data is in one frag. It works well if the data is in the headlen and the worst case is to copy, which is the same as load_bytes. It is a readonly use case. Maybe the bpf prog can directly read the frag. Regardless, it is useful to have a kfunc/helper to read it.quoted
In our case, the data copy is ~16 bytes, so the cost will not be a big problem I think.quoted
If this new rcvq callback is added to the 'bpf_tcp_ops' proposal [1], all this will go away. 'struct sk_buff *skb' can be directly passed to an ops of the 'bpf_tcp_ops'. Supporting '*skb' in a struct_ops has already been done in the bpf_qdisc. [1]: https://lore.kernel.org/bpf/20260519215841.2984970-11-martin.lau@linux.dev/ (local)Oh I missed the series, the struct_ops conversion looks nice ! Since this work isn't urgent, I can wait for your series if mine churns it. Jason's series is adding a new op, and I guess this can be integrated too ? https://lore.kernel.org/bpf/20260521135244.40869-5-kerneljasonxing@gmail.com/ (local)imo, a new sock_ops cb should be added as an ops in struct_ops. For example, in patch 4 of that series, bpf_skops_rx_timestamping assigns u64 to 'u32 args[4]', which is adding tech debt to the current sock_ops interface. For the timestamping case, it could be a separate ops for the 'struct sock' instead of 'struct tcp_sock' because it should at least work for both TCP and UDP.
Sorry, I don't get it. What is the tech debt in this? And bpf_skops_rx_timestamping() only outputs the timestamps, which has nothing to do with either 'sock' or 'tcp_sock'. Could you show me what to do next? Thanks in advance. It sounds like the tx side of bpf timestamping should be adjusted accordingly? Thanks, Jason