Re: [bpf-next PATCH 1/2] bpf, sockmap: add skb_adjust_room to pop bytes off ingress payload
From: John Fastabend <john.fastabend@gmail.com>
Date: 2020-09-29 15:41:49
Daniel Borkmann wrote:
On 9/26/20 6:27 AM, John Fastabend wrote:quoted
This implements a new helper skb_adjust_room() so users can push/pop extra bytes from a BPF_SK_SKB_STREAM_VERDICT program. Some protocols may include headers and other information that we may not want to include when doing a redirect from a BPF_SK_SKB_STREAM_VERDICT program. One use case is to redirect TLS packets into a receive socket that doesn't expect TLS data. In TLS case the first 13B or so contain the protocol header. With KTLS the payload is decrypted so we should be able to redirect this to a receiving socket, but the receiving socket may not be expecting to receive a TLS header and discard the data. Using the above helper we can pop the header off and put an appropriate header on the payload. This allows for creating a proxy between protocols without extra hops through the stack or userspace. So in order to fix this case add skb_adjust_room() so users can strip the header. After this the user can strip the header and an unmodified receiver thread will work correctly when data is redirected into the ingress path of a sock. Signed-off-by: John Fastabend <john.fastabend@gmail.com> --- net/core/filter.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+)diff --git a/net/core/filter.c b/net/core/filter.c index 4d8dc7a31a78..d232358f1dcd 100644 --- a/net/core/filter.c +++ b/net/core/filter.c@@ -76,6 +76,7 @@ #include <net/bpf_sk_storage.h> #include <net/transp_v6.h> #include <linux/btf_ids.h> +#include <net/tls.h> static const struct bpf_func_proto * bpf_sk_base_func_proto(enum bpf_func_id func_id);@@ -3218,6 +3219,53 @@ static u32 __bpf_skb_max_len(const struct sk_buff *skb) SKB_MAX_ALLOC; } +BPF_CALL_4(sk_skb_adjust_room, struct sk_buff *, skb, s32, len_diff, + u32, mode, u64, flags) +{ + unsigned int len_diff_abs = abs(len_diff);small nit: u32
Sure.
quoted
+ bool shrink = len_diff < 0; + int ret = 0; + + if (unlikely(flags)) + return -EINVAL;Parameter 'mode' is not used here, I guess we need to reject anything non-zero?
Probably its not used.
Similarly, any interaction wrt bpf_csum_level() that was needed back then for the bpf_skb_adjust_room()?
I don't believe so because we are above csum checks at this point. Either we will put the skb data in the receive_queue for the socket or redirect it into sendpage.
quoted
+ if (unlikely(len_diff_abs > 0xfffU)) + return -EFAULT; + + if (!shrink) { + unsigned int grow = len_diff;nit: u32 or just directly len_diff?
Just use len_diff missed when I cleaned this up.
quoted
+ ret = skb_cow(skb, grow); + if (likely(!ret)) { + __skb_push(skb, len_diff_abs); + memset(skb->data, 0, len_diff_abs); + } + } else { + /* skb_ensure_writable() is not needed here, as we're + * already working on an uncloned skb. + */ + if (unlikely(!pskb_may_pull(skb, len_diff_abs))) + return -ENOMEM; + __skb_pull(skb, len_diff_abs); + } + bpf_compute_data_end_sk_skb(skb); + if (tls_sw_has_ctx_rx(skb->sk)) { + struct strp_msg *rxm = strp_msg(skb); + + rxm->full_len += len_diff;If skb_cow() failed, we still adjust rxm->full_len?
Thanks. Will just return above on error like in the else branch. I'll send a v2 shortly.