Thread (88 messages) 88 messages, 5 authors, 2024-11-07

Re: [PATCH net-next v3 10/14] net-timestamp: add basic support with tskey offset

From: Jason Xing <hidden>
Date: 2024-10-31 03:27:40
Also in: bpf

On Thu, Oct 31, 2024 at 10:41 AM Jason Xing [off-list ref] wrote:
On Thu, Oct 31, 2024 at 9:17 AM Martin KaFai Lau [off-list ref] wrote:
quoted
On 10/29/24 11:50 PM, Jason Xing wrote:
quoted
On Wed, Oct 30, 2024 at 1:42 PM Martin KaFai Lau [off-list ref] wrote:
quoted
On 10/28/24 4:05 AM, Jason Xing wrote:
quoted
+/* Used to track the tskey for bpf extension
+ *
+ * @sk_tskey: bpf extension can use it only when no application uses.
+ *            Application can use it directly regardless of bpf extension.
+ *
+ * There are three strategies:
+ * 1) If we've already set through setsockopt() and here we're going to set
+ *    OPT_ID for bpf use, we will not re-initialize the @sk_tskey and will
+ *    keep the record of delta between the current "key" and previous key.
+ * 2) If we've already set through bpf_setsockopt() and here we're going to
+ *    set for application use, we will record the delta first and then
+ *    override/initialize the @sk_tskey.
+ * 3) other cases, which means only either of them takes effect, so initialize
+ *    everything simplely.
+ */
+static long int sock_calculate_tskey_offset(struct sock *sk, int val, int bpf_type)
+{
+     u32 tskey;
+
+     if (sk_is_tcp(sk)) {
+             if ((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN))
+                     return -EINVAL;
+
+             if (val & SOF_TIMESTAMPING_OPT_ID_TCP)
+                     tskey = tcp_sk(sk)->write_seq;
+             else
+                     tskey = tcp_sk(sk)->snd_una;
+     } else {
+             tskey = 0;
+     }
+
+     if (bpf_type && (sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)) {
+             sk->sk_tskey_bpf_offset = tskey - atomic_read(&sk->sk_tskey);
+             return 0;
+     } else if (!bpf_type && (sk->sk_tsflags_bpf & SOF_TIMESTAMPING_OPT_ID)) {
+             sk->sk_tskey_bpf_offset = atomic_read(&sk->sk_tskey) - tskey;
+     } else {
+             sk->sk_tskey_bpf_offset = 0;
+     }
+
+     return tskey;
+}
Before diving into this route, the bpf prog can peek into the tcp seq no in the
skb. It can also look at the sk->sk_tskey for UDP socket. Can you explain why
those are not enough information for the bpf prog?
Well, it does make sense. It seems we don't need to implement tskey
for this bpf feature...

Due to lack of enough knowledge of bpf, could you provide more hints
that I can follow to write a bpf program to print more information
from the skb? Like in the last patch of this series, in
tools/testing/selftests/bpf/prog_tests/so_timestamping.c, do we have a
feasible way to do that?
The bpf-prog@sendmsg() will be run to capture a timestamp for sendmsg().
When running the bpf-prog@sendmsg(), the skb can be set to the "struct
bpf_sock_ops_kern sock_ops;" which is passed to the sockops prog. Take a look at
bpf_skops_write_hdr_opt().
Thanks. I see the skb field in struct bpf_sock_ops_kern.
quoted
bpf prog cannot directly access the skops->skb now. It is because the sockops
prog sees the uapi "struct bpf_sock_ops" instead of "struct
bpf_sock_ops(_kern)". The conversion is done in sock_ops_convert_ctx_access. It
is an old way before BTF. I don't want to extend the uapi "struct bpf_sock_ops".
Oh, so it seems we cannot use this way, right?

I also noticed a use case that allow users to get the information from one skb:
"int BPF_PROG(trace_netif_receive_skb, struct sk_buff *skb)" in
tools/testing/selftests/bpf/progs/netif_receive_skb.c
But it requires us to add the tracepoint in __skb_tstamp_tx() first.
Two months ago, I was planning to use a tracepoint for some people who
find it difficult to deploy bpf.
quoted
Instead, use bpf_cast_to_kern_ctx((struct bpf_sock_ops *)skops_ctx) to get a
trusted "struct bpf_sock_ops(_kern) *skops" pointer. Then it can access the
skops->skb.
Let me spend some time on it. Thanks.
quoted
afaik, the tcb->seq should be available already during sendmsg. it
should be able to get it from TCP_SKB_CB(skb)->seq with the bpf_core_cast. Take
a look at the existing examples of bpf_core_cast.

The same goes for the skb->data. It can use the bpf_dynptr_from_skb(). It is not
available to skops program now but should be easy to expose.
I wonder what the use of skb->data is here.
quoted
The bpf prog wants to calculate the delay between [sendmsg, SCHED], [SCHED,
SND], [SND, ACK]. It is why (at least in my mental model) a key is needed to
co-relate the sendmsg, SCHED, SND, and ACK timestamp. The tcp seqno could be
served as that key.

All that said, while looking at tcp_tx_timestamp() again, there is always
"shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1;". shinfo->tskey can be
used directly as-is by the bpf prog. I think now I am missing why the bpf prog
needs the sk_tskey in the sk?
As you said, tcp seqno could be treated as the key, but it leaks the
information in TCP layer to users. Please see the commit:
commit 4ed2d765dfaccff5ebdac68e2064b59125033a3b
Author: Willem de Bruijn [off-list ref]
Date:   Mon Aug 4 22:11:49 2014 -0400

    net-timestamp: TCP timestamping
...
    - To avoid leaking the absolute seqno to userspace, the offset
    returned in ee_data must always be relative. It is an offset between
    an skb and sk field.

It has to be computed in the kernel before reporting to the user space, I think.
Well, I'm thinking since the BPF program can only be used by _admin_,
we will not take any risk even if the raw seq is exported to the BPF
program.

Willem, I would like to know your opinions about this point (about
whether we can export the raw seqno or not). Thanks.

Thanks,
Jason
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help