Re: [PATCH v3 bpf-next 6/9] bpf: tcp: Allow bpf prog to write and parse TCP header option
From: Eric Dumazet <edumazet@google.com>
Date: 2020-07-31 16:07:14
Also in:
bpf
On Thu, Jul 30, 2020 at 1:57 PM Martin KaFai Lau [off-list ref] wrote:
The earlier effort in BPF-TCP-CC allows the TCP Congestion Control
algorithm to be written in BPF. It opens up opportunities to allow
a faster turnaround time in testing/releasing new congestion control
ideas to production environment.
The same flexibility can be extended to writing TCP header option.
It is not uncommon that people want to test new TCP header option
to improve the TCP performance. Another use case is for data-center
that has a more controlled environment and has more flexibility in
putting header options for internal only use.
For example, we want to test the idea in putting maximum delay
ACK in TCP header option which is similar to a draft RFC proposal [1].
This patch introduces the necessary BPF API and use them in the
TCP stack to allow BPF_PROG_TYPE_SOCK_OPS program to parse
and write TCP header options. It currently supports most of
the TCP packet except RST.
Supported TCP header option:
───────────────────────────
This patch allows the bpf-prog to write any option kind.
Different bpf-progs can write its own option by calling the new helper
bpf_store_hdr_opt(). The helper will ensure there is no duplicated
option in the header.
By allowing bpf-prog to write any option kind, this gives a lot of
flexibility to the bpf-prog. Different bpf-prog can write its
own option kind. It could also allow the bpf-prog to support a
recently standardized option on an older kernel.
Sockops Callback Flags:
──────────────────────
The header parsing and writing callback can be turned on
by enabling a few newly added callback flags:
BPF_SOCK_OPS_PARSE_UNKNOWN_HDR_OPT_CB_FLAG:
Call bpf when kernel has received a header option that
the kernel cannot handle. It is useful when the peer doesn't
send bpf-options very often.
The bpf-prog can inspect the received header by sock_ops->skb_data
which covers the whole header (including the fixed fields like
ports, flags...etc) or
use the new bpf_load_hdr_opt() to search for a particular TCP
header option.
[1]: draft-wang-tcpm-low-latency-opt-00
https://tools.ietf.org/html/draft-wang-tcpm-low-latency-opt-00
Signed-off-by: Martin KaFai Lau <redacted>
---
include/linux/bpf-cgroup.h | 25 +++
include/linux/filter.h | 4 +
include/net/tcp.h | 53 ++++-
include/uapi/linux/bpf.h | 231 ++++++++++++++++++++-
net/core/filter.c | 365 +++++++++++++++++++++++++++++++++
net/ipv4/tcp_fastopen.c | 2 +-
net/ipv4/tcp_input.c | 86 +++++++-
net/ipv4/tcp_ipv4.c | 3 +-
net/ipv4/tcp_minisocks.c | 1 +
net/ipv4/tcp_output.c | 194 ++++++++++++++++--
net/ipv6/tcp_ipv6.c | 3 +-
tools/include/uapi/linux/bpf.h | 231 ++++++++++++++++++++-
12 files changed, 1171 insertions(+), 27 deletions(-)This is a truly gigantic patch. Could you split it in maybe two parts ? This way I could focus on the TCP changes, and let eBPF experts focus on BPF changes.