Re: [PATCH 1/3] bpf: add helper to check for a valid SYN cookie

From: Lorenz Bauer <hidden>
Date: 2019-02-28 15:11:24
Also in: netdev

On Tue, 26 Feb 2019 at 05:38, Martin Lau [off-list ref] wrote:

On Mon, Feb 25, 2019 at 06:26:42PM +0000, Lorenz Bauer wrote:

quoted

On Sat, 23 Feb 2019 at 00:44, Martin Lau [off-list ref] wrote:

quoted

On Fri, Feb 22, 2019 at 09:50:55AM +0000, Lorenz Bauer wrote:

quoted

Using bpf_sk_lookup_tcp it's possible to ascertain whether a packet belongs
to a known connection. However, there is one corner case: no sockets are
created if SYN cookies are active. This means that the final ACK in the
3WHS is misclassified.

Using the helper, we can look up the listening socket via bpf_sk_lookup_tcp
and then check whether a packet is a valid SYN cookie ACK.

Signed-off-by: Lorenz Bauer <redacted>
---
 include/uapi/linux/bpf.h | 18 ++++++++++-
 net/core/filter.c        | 68 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 85 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index bcdd2474eee7..bc2af87e9621 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h

@@ -2359,6 +2359,21 @@ union bpf_attr {
  *   Return
  *           A **struct bpf_tcp_sock** pointer on success, or NULL in
  *           case of failure.
+ *
+ * int bpf_sk_check_syncookie(struct bpf_sock *sk, void *iph, u32 iph_len, struct tcphdr *th, u32 th_len)
+ *   Description
+ *           Check whether iph and th contain a valid SYN cookie ACK for
+ *           the listening socket in sk.
+ *
+ *           iph points to the start of the IPv4 or IPv6 header, while
+ *           iph_len contains sizeof(struct iphdr) or sizeof(struct ip6hdr).
+ *
+ *           th points to the start of the TCP header, while th_len contains
+ *           sizeof(struct tcphdr).
+ *
+ *   Return
+ *           0 if iph and th are a valid SYN cookie ACK, or a negative error
+ *           otherwise.
  */
 #define __BPF_FUNC_MAPPER(FN)                \
      FN(unspec),                     \

@@ -2457,7 +2472,8 @@ union bpf_attr {
      FN(spin_lock),                  \
      FN(spin_unlock),                \
      FN(sk_fullsock),                \
-     FN(tcp_sock),
+     FN(tcp_sock),                   \
+     FN(sk_check_syncookie),

 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call

diff --git a/net/core/filter.c b/net/core/filter.c
index 85749f6ec789..9e68897cc7ed 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c

@@ -5426,6 +5426,70 @@ static const struct bpf_func_proto bpf_tcp_sock_proto = {
      .arg1_type      = ARG_PTR_TO_SOCK_COMMON,
 };

+BPF_CALL_5(bpf_sk_check_syncookie, struct sock *, sk, void *, iph, u32, iph_len,

s/bpf_sk_check_syncookie/bpf_tcp_check_syncookie/>

quoted

+        struct tcphdr *, th, u32, th_len)
+{
+#if IS_ENABLED(CONFIG_SYN_COOKIES)

nit. "#ifdef CONFIG_SYN_COOKIES" such that it is clear it is a bool kconfig.

quoted

+     u32 cookie;
+     int ret;
+
+     if (unlikely(th_len < sizeof(*th)))
+             return -EINVAL;
+
+     /* sk_listener() allows TCP_NEW_SYN_RECV, which makes no sense here. */
+     if (sk->sk_protocol != IPPROTO_TCP || sk->sk_state != TCP_LISTEN)

From the test program in patch 3, the "sk" here is obtained from
bpf_sk_lookup_tcp() which does a sk_to_full_sk() before returning.
AFAICT, meaning bpf_sk_lookup_tcp() will return the listening sk
even if there is a request_sock.  Does it make sense to check
syncookie if there is already a request_sock?

No, that doesn't make a lot of sense. I hadn't realised that
sk_lookup_tcp only returns full sockets.
This means we need a way to detect that there is a request sock for a
given tuple.

* adding a reqsk_exists(tuple) helper means we have to pay the lookup cost twice
* drop the sk argument and do the necessary lookups in the helper
itself, but that also
  wastes a call to __inet_lookup_listener
* skip sk_to_full_sk() in a helper and return RET_PTR_TO_SOCK_COMMON,
  but that violates a bunch of assumptions (e.g. calling bpf_sk_release on them)

How about creating a new lookup helper, bpf_sk"c"_lookup_tcp,
that does not call sk_to_full_sk() before returning.
Its ".ret_type" will be RET_PTR_TO_SOCK_COMMON_OR_NULL which its
reference(-counting) state has to be tracked in the verifier also.
Mainly in check_helper_call(), iirc.

The bpf_prog can then check bpf_sock->state for TCP_LISTEN,
call bpf_tcp_sock() to get the TCP listener sock and pass to
the bpf_tcp_check_syncookie()

I've started working on this, and I've hit a snag with the reference
tracking behaviour
of bpf_tcp_sock. From what I can tell, the assumption is that a PTR_TO_TCP_SOCK
doesn't need reference tracking, because its either skb->sk or a TCP listener.
In the former case, the socket is refcounted via the sk_buff, in the
latter we don't need
to worry since the eBPF is called with the RCU read lock held.

However, non-listening sockets returned by bpf_sk_lookup_tcp, can be
freed before the
end of the eBPF program. Doing bpf_sk_lookup_tcp, bpf_tcp_sock,
bpf_sk_release allows
eBPF to gain a (read-only) reference to a freed socket. I've attached
a patch with a testcase
which illustrates this issue.

Is this the intended behaviour? If not, maybe it would be the easiest
to make bpf_tcp_sock
increase the refcount if !SOCK_RCU_FREE and require a corresponding
bpf_sk_release?
That would simplify my work to add RET_PTR_TO_SOCK_COMMON as wel..

quoted

For context: ultimately we want use this to answer the question: does
this (encapsulated)
packet contain a payload destined to a local socket? Amongst the edge
cases we need to
handle are ICMP Packet Too Big messages and SYN cookies. A solution
would be to hide
all this in an "uber" helper that takes pointers to the L3 / L4
headers and returns a verdict,
but that seems a bit gross.

Please include this use case in the commit message.
It is useful.

quoted

+             return -EINVAL;
+
+     if (!sock_net(sk)->ipv4.sysctl_tcp_syncookies)

Should tcp_synq_no_recent_overflow(tp) be checked also?

Yes, not sure how that slipped out.

quoted

+             return -EINVAL;
+
+     if (!th->ack || th->rst)

How about th->syn?

Yes, I missed the fact that the callers in tcp_ipv{4,6}.c check this.

quoted

+             return -ENOENT;
+
+     cookie = ntohl(th->ack_seq) - 1;
+
+     switch (sk->sk_family) {
+     case AF_INET:
+             if (unlikely(iph_len < sizeof(struct iphdr)))
+                     return -EINVAL;
+
+             ret = __cookie_v4_check((struct iphdr *)iph, th, cookie);
+             break;
+
+#if IS_ENABLED(CONFIG_IPV6)
+     case AF_INET6:
+             if (unlikely(iph_len < sizeof(struct ipv6hdr)))
+                     return -EINVAL;
+
+             ret = __cookie_v6_check((struct ipv6hdr *)iph, th, cookie);
+             break;
+#endif /* CONFIG_IPV6 */
+
+     default:
+             return -EPROTONOSUPPORT;
+     }
+
+     if (ret > 0)
+             return 0;
+
+     return -ENOENT;
+#else
+     return -ENOTSUP;
+#endif
+}
+
+static const struct bpf_func_proto bpf_sk_check_syncookie_proto = {
+     .func           = bpf_sk_check_syncookie,
+     .gpl_only       = true,
+     .pkt_access     = true,
+     .ret_type       = RET_INTEGER,
+     .arg1_type      = ARG_PTR_TO_SOCKET,

I think it should be ARG_PTR_TO_TCP_SOCK

quoted

+     .arg2_type      = ARG_PTR_TO_MEM,
+     .arg3_type      = ARG_CONST_SIZE,
+     .arg4_type      = ARG_PTR_TO_MEM,
+     .arg5_type      = ARG_CONST_SIZE,
+};
+
 #endif /* CONFIG_INET */



--
Lorenz Bauer  |  Systems Engineer
25 Lavington St., London SE1 0NZ

https://urldefense.proofpoint.com/v2/url?u=http-3A__www.cloudflare.com&d=DwIBaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=VQnoQ7LvghIj0gVEaiQSUw&m=xhDwvX3iD-mbqSrx-L8XQNaZiYFZzMWNo_2Y38Z9j34&s=I4Ag3HflabFppFv7UtMp8WnMVSqCDW0W28ziWIvuwDE&e=

---
 tools/testing/selftests/bpf/verifier/sock.c | 23 +++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/tools/testing/selftests/bpf/verifier/sock.c

b/tools/testing/selftests/bpf/verifier/sock.c
index 0ddfdf76aba5..3307cca6bdd5 100644

--- a/tools/testing/selftests/bpf/verifier/sock.c
+++ b/tools/testing/selftests/bpf/verifier/sock.c

@@ -382,3 +382,26 @@
        .result = REJECT,
        .errstr = "type=tcp_sock expected=sock",
 },
+{
+       "use bpf_tcp_sock after bpf_sk_release",
+       .insns = {
+       BPF_SK_LOOKUP,
+       BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
+       BPF_EXIT_INSN(),
+       BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
+       BPF_MOV64_REG(BPF_REG_1, BPF_REG_0),
+       BPF_EMIT_CALL(BPF_FUNC_tcp_sock),
+       BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 3),
+       BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+       BPF_EMIT_CALL(BPF_FUNC_sk_release),
+       BPF_EXIT_INSN(),
+       BPF_MOV64_REG(BPF_REG_7, BPF_REG_0),
+       BPF_MOV64_REG(BPF_REG_1, BPF_REG_6),
+       BPF_EMIT_CALL(BPF_FUNC_sk_release),
+       BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_7, offsetof(struct

bpf_tcp_sock, snd_cwnd)),
+       BPF_EXIT_INSN(),
+       },
+       .prog_type = BPF_PROG_TYPE_SCHED_CLS,
+       .result = REJECT,
+       .errstr = "bogus",
+},

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help