Thread (33 messages) 33 messages, 5 authors, 3d ago

Re: [PATCH v3 bpf-next 03/11] bpf: tcp: Support bpf_skb_load_bytes() for BPF_SOCK_OPS_RCVQ_CB.

From: Kuniyuki Iwashima <kuniyu@google.com>
Date: 2026-05-26 21:22:10
Also in: bpf

On Tue, May 26, 2026 at 1:34 PM Martin KaFai Lau [off-list ref] wrote:
On Sat, May 23, 2026 at 08:29:32AM +0000, Kuniyuki Iwashima wrote:
quoted
When a TCP skb is queued to sk->sk_receive_queue, BPF SOCK_OPS
prog can be called with BPF_SOCK_OPS_RCVQ_CB.

In this hook, we want to parse the RPC descriptor in the skb
and adjust sk->sk_rcvlowat based on the RPC frame size.

However, we cannot access payload via bpf_sock_ops.data on
modern NICs with TCP header/data split on as the payload is
not placed in the linear area.

Let's support bpf_skb_load_bytes() for BPF_SOCK_OPS_RCVQ_CB.

Three notes:

  1) bpf_sock_ops_kern.skb will be NULL when the BPF prog is
      invoked from recvmsg().

  2) Access to bpf_sock_ops.data will be disabled by passing
      0 end_offset to bpf_skops_init_skb().

  3) ____bpf_skb_load_bytes() is called directly instead of
     __bpf_skb_load_bytes() to allow compilers to inline it
     instead of generating a tail-call.
Some observations below.
quoted
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
v2: Explain why using ____ version instead of __
---
 net/core/filter.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)
diff --git a/net/core/filter.c b/net/core/filter.c
index 4a50fe2cd863..fa8a7c7d86eb 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -7760,6 +7760,38 @@ static const struct bpf_func_proto bpf_sk_assign_proto = {
      .arg3_type      = ARG_ANYTHING,
 };

+BPF_CALL_4(bpf_sock_ops_skb_load_bytes, struct bpf_sock_ops_kern *, bpf_sock,
+        u32, offset, void *, to, u32, len)
+{
+     int err;
+
+     if (bpf_sock->op != BPF_SOCK_OPS_RCVQ_CB) {
bpf_dynptr_from_skb() and bpf_dynptr_slice() kfunc could also be considered.
One less bpf_sock->op check in filter.c to maintain and could also avoid
a data copy. There is a bpf_cast_to_kern_ctx() to get to a trusted
skops_kern pointer but this will need changes in verifier.c to get to
skops_kern->skb (e.g. in type_is_trusted_or_null) and this is the tradeoff.
Maybe a dumb question, but does it add extra cost (extra dynptr
function call?) if data overlaps two frags, or can dynptr handle it
seamlessly with a single bpf_dynptr_slice() ?

In our case, the data copy is ~16 bytes, so the cost will not be
a big problem I think.

If this new rcvq callback is added to the 'bpf_tcp_ops' proposal [1],
all this will go away. 'struct sk_buff *skb' can be directly passed to an
ops of the 'bpf_tcp_ops'. Supporting '*skb' in a struct_ops has already
been done in the bpf_qdisc.

[1]: https://lore.kernel.org/bpf/20260519215841.2984970-11-martin.lau@linux.dev/ (local)
Oh I missed the series, the struct_ops conversion looks nice !
Since this work isn't urgent, I can wait for your series if mine
churns it.

Jason's series is adding a new op, and I guess this can be
integrated too ?
https://lore.kernel.org/bpf/20260521135244.40869-5-kerneljasonxing@gmail.com/ (local)
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help