Thread (33 messages) 33 messages, 5 authors, 3d ago

Re: [PATCH v3 bpf-next 03/11] bpf: tcp: Support bpf_skb_load_bytes() for BPF_SOCK_OPS_RCVQ_CB.

From: Jason Xing <hidden>
Date: 2026-05-27 04:01:52
Also in: bpf

On Wed, May 27, 2026 at 6:19 AM Martin KaFai Lau [off-list ref] wrote:
On Tue, May 26, 2026 at 02:21:56PM -0700, Kuniyuki Iwashima wrote:
quoted
On Tue, May 26, 2026 at 1:34 PM Martin KaFai Lau [off-list ref] wrote:
quoted
On Sat, May 23, 2026 at 08:29:32AM +0000, Kuniyuki Iwashima wrote:
quoted
When a TCP skb is queued to sk->sk_receive_queue, BPF SOCK_OPS
prog can be called with BPF_SOCK_OPS_RCVQ_CB.

In this hook, we want to parse the RPC descriptor in the skb
and adjust sk->sk_rcvlowat based on the RPC frame size.

However, we cannot access payload via bpf_sock_ops.data on
modern NICs with TCP header/data split on as the payload is
not placed in the linear area.

Let's support bpf_skb_load_bytes() for BPF_SOCK_OPS_RCVQ_CB.

Three notes:

  1) bpf_sock_ops_kern.skb will be NULL when the BPF prog is
      invoked from recvmsg().

  2) Access to bpf_sock_ops.data will be disabled by passing
      0 end_offset to bpf_skops_init_skb().

  3) ____bpf_skb_load_bytes() is called directly instead of
     __bpf_skb_load_bytes() to allow compilers to inline it
     instead of generating a tail-call.
Some observations below.
quoted
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
---
v2: Explain why using ____ version instead of __
---
 net/core/filter.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)
diff --git a/net/core/filter.c b/net/core/filter.c
index 4a50fe2cd863..fa8a7c7d86eb 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -7760,6 +7760,38 @@ static const struct bpf_func_proto bpf_sk_assign_proto = {
      .arg3_type      = ARG_ANYTHING,
 };

+BPF_CALL_4(bpf_sock_ops_skb_load_bytes, struct bpf_sock_ops_kern *, bpf_sock,
+        u32, offset, void *, to, u32, len)
+{
+     int err;
+
+     if (bpf_sock->op != BPF_SOCK_OPS_RCVQ_CB) {
bpf_dynptr_from_skb() and bpf_dynptr_slice() kfunc could also be considered.
One less bpf_sock->op check in filter.c to maintain and could also avoid
a data copy. There is a bpf_cast_to_kern_ctx() to get to a trusted
skops_kern pointer but this will need changes in verifier.c to get to
skops_kern->skb (e.g. in type_is_trusted_or_null) and this is the tradeoff.
Maybe a dumb question, but does it add extra cost (extra dynptr
function call?) if data overlaps two frags, or can dynptr handle it
seamlessly with a single bpf_dynptr_slice() ?
Right, there is an extra bpf_dynptr_from_skb(). I don't think we have
benchmarked it.

If I read it correctly, unlike bpf_xdp_pointer, the skb_header_pointer
will still copy even if the data is in one frag. It works well if the data
is in the headlen and the worst case is to copy, which is the same as
load_bytes.

It is a readonly use case. Maybe the bpf prog can directly read the frag.
Regardless, it is useful to have a kfunc/helper to read it.
quoted
In our case, the data copy is ~16 bytes, so the cost will not be
a big problem I think.

quoted
If this new rcvq callback is added to the 'bpf_tcp_ops' proposal [1],
all this will go away. 'struct sk_buff *skb' can be directly passed to an
ops of the 'bpf_tcp_ops'. Supporting '*skb' in a struct_ops has already
been done in the bpf_qdisc.

[1]: https://lore.kernel.org/bpf/20260519215841.2984970-11-martin.lau@linux.dev/ (local)
Oh I missed the series, the struct_ops conversion looks nice !
Since this work isn't urgent, I can wait for your series if mine
churns it.

Jason's series is adding a new op, and I guess this can be
integrated too ?
https://lore.kernel.org/bpf/20260521135244.40869-5-kerneljasonxing@gmail.com/ (local)
imo, a new sock_ops cb should be added as an ops in struct_ops. For example,
in patch 4 of that series, bpf_skops_rx_timestamping assigns u64 to 'u32
args[4]', which is adding tech debt to the current sock_ops interface.
For the timestamping case, it could be a separate ops for the
'struct sock' instead of 'struct tcp_sock' because it should
at least work for both TCP and UDP.
Sorry, I don't get it. What is the tech debt in this? And
bpf_skops_rx_timestamping() only outputs the timestamps, which has
nothing to do with either 'sock' or 'tcp_sock'.

Could you show me what to do next? Thanks in advance. It sounds like
the tx side of bpf timestamping should be adjusted accordingly?

Thanks,
Jason
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help