Re: [PATCH v3 bpf-next 03/11] bpf: tcp: Support bpf_skb_load_bytes() for... | netdev

[PATCH v3 bpf-next 00/11] bpf: Add SOCK_OPS hooks for TCP AutoLOWAT. · Kuniyuki Iwashima <kuniyu@google.com> · 2026-05-23
[PATCH v3 bpf-next 01/11] selftest: bpf: Use BPF_SOCK_OPS_ALL_CB_FLAGS + 1 for bad_cb_test_rv. · Kuniyuki Iwashima <kuniyu@google.com> · 2026-05-23
Re: [PATCH v3 bpf-next 01/11] selftest: bpf: Use BPF_SOCK_OPS_ALL_CB_FLAGS + 1 for bad_cb_test_rv. · bot+bpf-ci@kernel.org · 2026-05-23
[PATCH v3 bpf-next 02/11] bpf: tcp: Introduce BPF_SOCK_OPS_RCVQ_CB. · Kuniyuki Iwashima <kuniyu@google.com> · 2026-05-23
[PATCH v3 bpf-next 03/11] bpf: tcp: Support bpf_skb_load_bytes() for BPF_SOCK_OPS_RCVQ_CB. · Kuniyuki Iwashima <kuniyu@google.com> · 2026-05-23
Re: [PATCH v3 bpf-next 03/11] bpf: tcp: Support bpf_skb_load_bytes() for BPF_SOCK_OPS_RCVQ_CB. · Martin KaFai Lau <martin.lau@linux.dev> · 2026-05-26
Re: [PATCH v3 bpf-next 03/11] bpf: tcp: Support bpf_skb_load_bytes() for BPF_SOCK_OPS_RCVQ_CB. · Kuniyuki Iwashima <kuniyu@google.com> · 2026-05-26
Re: [PATCH v3 bpf-next 03/11] bpf: tcp: Support bpf_skb_load_bytes() for BPF_SOCK_OPS_RCVQ_CB. · Martin KaFai Lau <martin.lau@linux.dev> · 2026-05-26
Re: [PATCH v3 bpf-next 03/11] bpf: tcp: Support bpf_skb_load_bytes() for BPF_SOCK_OPS_RCVQ_CB. · Jason Xing <hidden> · 2026-05-27
Re: [PATCH v3 bpf-next 03/11] bpf: tcp: Support bpf_skb_load_bytes() for BPF_SOCK_OPS_RCVQ_CB. · Martin KaFai Lau <martin.lau@linux.dev> · 2026-05-27
Re: [PATCH v3 bpf-next 03/11] bpf: tcp: Support bpf_skb_load_bytes() for BPF_SOCK_OPS_RCVQ_CB. · Martin KaFai Lau <martin.lau@linux.dev> · 2026-05-27
Re: [PATCH v3 bpf-next 03/11] bpf: tcp: Support bpf_skb_load_bytes() for BPF_SOCK_OPS_RCVQ_CB. · Kuniyuki Iwashima <kuniyu@google.com> · 2026-05-27
Re: [PATCH v3 bpf-next 03/11] bpf: tcp: Support bpf_skb_load_bytes() for BPF_SOCK_OPS_RCVQ_CB. · Martin KaFai Lau <martin.lau@linux.dev> · 2026-05-28
Re: [PATCH v3 bpf-next 03/11] bpf: tcp: Support bpf_skb_load_bytes() for BPF_SOCK_OPS_RCVQ_CB. · Jason Xing <hidden> · 2026-05-28
[PATCH v3 bpf-next 04/11] tcp: Split out __tcp_set_rcvlowat(). · Kuniyuki Iwashima <kuniyu@google.com> · 2026-05-23
[PATCH v3 bpf-next 05/11] bpf: tcp: Add kfunc to adjust sk->sk_rcvlowat. · Kuniyuki Iwashima <kuniyu@google.com> · 2026-05-23
Re: [PATCH v3 bpf-next 05/11] bpf: tcp: Add kfunc to adjust sk->sk_rcvlowat. · bot+bpf-ci@kernel.org · 2026-05-23
[PATCH v3 bpf-next 06/11] bpf: tcp: Make BPF_SOCK_OPS_RCVQ_CB and SOCKMAP mutually exclusive. · Kuniyuki Iwashima <kuniyu@google.com> · 2026-05-23
Re: [PATCH v3 bpf-next 06/11] bpf: tcp: Make BPF_SOCK_OPS_RCVQ_CB and SOCKMAP mutually exclusive. · bot+bpf-ci@kernel.org · 2026-05-23
Re: [PATCH v3 bpf-next 06/11] bpf: tcp: Make BPF_SOCK_OPS_RCVQ_CB and SOCKMAP mutually exclusive. · Kuniyuki Iwashima <kuniyu@google.com> · 2026-05-24
[PATCH v3 bpf-next 08/11] bpf: tcp: Reject BPF_SOCK_OPS_RCVQ_CB if receive queue is not empty. · Kuniyuki Iwashima <kuniyu@google.com> · 2026-05-23
Re: [PATCH v3 bpf-next 08/11] bpf: tcp: Reject BPF_SOCK_OPS_RCVQ_CB if receive queue is not empty. · bot+bpf-ci@kernel.org · 2026-05-23
[PATCH v3 bpf-next 07/11] bpf: mptcp: Don't support BPF_SOCK_OPS_RCVQ_CB. · Kuniyuki Iwashima <kuniyu@google.com> · 2026-05-23
[PATCH v3 bpf-next 09/11] bpf: tcp: Factorise bpf_skops_established(). · Kuniyuki Iwashima <kuniyu@google.com> · 2026-05-23
[PATCH v3 bpf-next 10/11] bpf: tcp: Add SOCK_OPS rcvlowat hook. · Kuniyuki Iwashima <kuniyu@google.com> · 2026-05-23
Re: [PATCH v3 bpf-next 10/11] bpf: tcp: Add SOCK_OPS rcvlowat hook. · Martin KaFai Lau <martin.lau@linux.dev> · 2026-05-26
Re: [PATCH v3 bpf-next 10/11] bpf: tcp: Add SOCK_OPS rcvlowat hook. · Kuniyuki Iwashima <kuniyu@google.com> · 2026-05-26
Re: [PATCH v3 bpf-next 10/11] bpf: tcp: Add SOCK_OPS rcvlowat hook. · Amery Hung <hidden> · 2026-05-26
Re: [PATCH v3 bpf-next 10/11] bpf: tcp: Add SOCK_OPS rcvlowat hook. · Kuniyuki Iwashima <kuniyu@google.com> · 2026-05-26
[PATCH v3 bpf-next 11/11] selftest: bpf: Add test for BPF_SOCK_OPS_RCVQ_CB. · Kuniyuki Iwashima <kuniyu@google.com> · 2026-05-23
Re: [PATCH v3 bpf-next 11/11] selftest: bpf: Add test for BPF_SOCK_OPS_RCVQ_CB. · bot+bpf-ci@kernel.org · 2026-05-23
Re: [PATCH v3 bpf-next 11/11] selftest: bpf: Add test for BPF_SOCK_OPS_RCVQ_CB. · Kuniyuki Iwashima <kuniyu@google.com> · 2026-05-24
Re: [PATCH v3 bpf-next 11/11] selftest: bpf: Add test for BPF_SOCK_OPS_RCVQ_CB. · Martin KaFai Lau <martin.lau@linux.dev> · 2026-05-26

Re: [PATCH v3 bpf-next 03/11] bpf: tcp: Support bpf_skb_load_bytes() for BPF_SOCK_OPS_RCVQ_CB.

From: Jason Xing <hidden>
Date: 2026-05-27 04:01:52
Also in: bpf

On Wed, May 27, 2026 at 6:19 AM Martin KaFai Lau [off-list ref] wrote:

On Tue, May 26, 2026 at 02:21:56PM -0700, Kuniyuki Iwashima wrote:

quoted

On Tue, May 26, 2026 at 1:34 PM Martin KaFai Lau [off-list ref] wrote:

quoted

On Sat, May 23, 2026 at 08:29:32AM +0000, Kuniyuki Iwashima wrote:

quoted

When a TCP skb is queued to sk->sk_receive_queue, BPF SOCK_OPS
prog can be called with BPF_SOCK_OPS_RCVQ_CB.

In this hook, we want to parse the RPC descriptor in the skb
and adjust sk->sk_rcvlowat based on the RPC frame size.

However, we cannot access payload via bpf_sock_ops.data on
modern NICs with TCP header/data split on as the payload is
not placed in the linear area.

Let's support bpf_skb_load_bytes() for BPF_SOCK_OPS_RCVQ_CB.

Three notes:

  1) bpf_sock_ops_kern.skb will be NULL when the BPF prog is
      invoked from recvmsg().

  2) Access to bpf_sock_ops.data will be disabled by passing
      0 end_offset to bpf_skops_init_skb().

  3) ____bpf_skb_load_bytes() is called directly instead of
     __bpf_skb_load_bytes() to allow compilers to inline it
     instead of generating a tail-call.

Some observations below.

quoted

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
v2: Explain why using ____ version instead of __
---
 net/core/filter.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index 4a50fe2cd863..fa8a7c7d86eb 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c

@@ -7760,6 +7760,38 @@ static const struct bpf_func_proto bpf_sk_assign_proto = {
      .arg3_type      = ARG_ANYTHING,
 };

+BPF_CALL_4(bpf_sock_ops_skb_load_bytes, struct bpf_sock_ops_kern *, bpf_sock,
+        u32, offset, void *, to, u32, len)
+{
+     int err;
+
+     if (bpf_sock->op != BPF_SOCK_OPS_RCVQ_CB) {

bpf_dynptr_from_skb() and bpf_dynptr_slice() kfunc could also be considered.
One less bpf_sock->op check in filter.c to maintain and could also avoid
a data copy. There is a bpf_cast_to_kern_ctx() to get to a trusted
skops_kern pointer but this will need changes in verifier.c to get to
skops_kern->skb (e.g. in type_is_trusted_or_null) and this is the tradeoff.

Maybe a dumb question, but does it add extra cost (extra dynptr
function call?) if data overlaps two frags, or can dynptr handle it
seamlessly with a single bpf_dynptr_slice() ?

Right, there is an extra bpf_dynptr_from_skb(). I don't think we have
benchmarked it.

If I read it correctly, unlike bpf_xdp_pointer, the skb_header_pointer
will still copy even if the data is in one frag. It works well if the data
is in the headlen and the worst case is to copy, which is the same as
load_bytes.

It is a readonly use case. Maybe the bpf prog can directly read the frag.
Regardless, it is useful to have a kfunc/helper to read it.

quoted

In our case, the data copy is ~16 bytes, so the cost will not be
a big problem I think.

quoted

If this new rcvq callback is added to the 'bpf_tcp_ops' proposal [1],
all this will go away. 'struct sk_buff *skb' can be directly passed to an
ops of the 'bpf_tcp_ops'. Supporting '*skb' in a struct_ops has already
been done in the bpf_qdisc.

[1]: https://lore.kernel.org/bpf/20260519215841.2984970-11-martin.lau@linux.dev/ (local)

Oh I missed the series, the struct_ops conversion looks nice !
Since this work isn't urgent, I can wait for your series if mine
churns it.

Jason's series is adding a new op, and I guess this can be
integrated too ?
https://lore.kernel.org/bpf/20260521135244.40869-5-kerneljasonxing@gmail.com/ (local)

imo, a new sock_ops cb should be added as an ops in struct_ops. For example,
in patch 4 of that series, bpf_skops_rx_timestamping assigns u64 to 'u32
args[4]', which is adding tech debt to the current sock_ops interface.
For the timestamping case, it could be a separate ops for the
'struct sock' instead of 'struct tcp_sock' because it should
at least work for both TCP and UDP.

Sorry, I don't get it. What is the tech debt in this? And
bpf_skops_rx_timestamping() only outputs the timestamps, which has
nothing to do with either 'sock' or 'tcp_sock'.

Could you show me what to do next? Thanks in advance. It sounds like
the tx side of bpf timestamping should be adjusted accordingly?

Thanks,
Jason

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help