Re: [PATCH net-next 05/11] tcp: allow mptcp to drop TS for some packets
From: Matthieu Baerts <matttbe@kernel.org>
Date: 2026-06-01 07:37:02
Also in:
lkml, mptcp
On 01/06/2026 17:26, Eric Dumazet wrote:
On Sun, May 31, 2026 at 11:52 PM Matthieu Baerts [off-list ref] wrote:quoted
Hi Eric, Thank you for the review! On 01/06/2026 16:04, Eric Dumazet wrote:quoted
On Sun, May 31, 2026 at 10:25 PM Matthieu Baerts (NGI0) [off-list ref] wrote:quoted
With TCP-timestamps (padded) taking 12 bytes and ADD_ADDR IPv6 + port taking 30 bytes, the 40-byte limit for the TCP options is reached. In this case, it is then not possible to send the address signal. The idea is to let MPTCP dropping the TCP-timestamps option for some specific packets, to be able to send some specific pure ACK carrying >28 bytes of MPTCP options, like with this specific ADD_ADDR. A new parameter is passed from tcp_established_options to the MPTCP side to indicate if the TCP TS option is used, and if it should be dropped. The next commit implements the part on MPTCP side, but split into two patches to help TCP maintainers to identify the modifications on TCP side. This feature will be controlled by a new add_addr_v6_port_drop_ts MPTCP sysctl knob. It is important to keep in mind that dropping the TCP timestamps option for one packet of the connection could eventually disrupt some middleboxes: even if it should be unlikely, they could drop the packet or even block the connection. That's why this new feature will be controlled by a sysctl knob. Note that it would be technically possible to squeeze both options into the header if the ADD_ADDR is first written, and then the TCP timestamps without the NOPs preceding it. But this means more modifications on TCP side, plus some middleboxes could still be disrupted by that. About the implementation, instead of passing a new boolean (drop_ts), another option would be to pass the whole option structure (opts), but 'struct tcp_out_options' is currently defined in tcp_output.c, and would need to be exported. Plus that means the removal of the TCP TS option would be done on the MPTCP side, and not here on the TCP side. It feels clearer to remove other TCP options from the TCP side, than hiding that from the MPTCP side. Yet an other alternative would be to pass the size already taken by the other TCP options, and have a way to drop them all when needed. But this feels better to target only the timestamps option where dropping it should be safe, even if it is currently the only option that would be set before MPTCP, when MPTCP is used. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> --- To: Neal Cardwell <ncardwell@google.com> To: Kuniyuki Iwashima <kuniyu@google.com> --- include/net/mptcp.h | 3 ++- net/ipv4/tcp_output.c | 6 +++++- net/mptcp/options.c | 3 ++- 3 files changed, 9 insertions(+), 3 deletions(-)diff --git a/include/net/mptcp.h b/include/net/mptcp.h index f7263fe2a2e4..000b6593bfa4 100644 --- a/include/net/mptcp.h +++ b/include/net/mptcp.h@@ -151,7 +151,7 @@ bool mptcp_synack_options(const struct request_sock *req, unsigned int *size, struct mptcp_out_options *opts); bool mptcp_established_options(struct sock *sk, struct sk_buff *skb, unsigned int *size, unsigned int remaining, - struct mptcp_out_options *opts); + bool *drop_ts, struct mptcp_out_options *opts); bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb); void mptcp_write_options(struct tcphdr *th, __be32 *ptr, struct tcp_sock *tp,@@ -270,6 +270,7 @@ static inline bool mptcp_established_options(struct sock *sk, struct sk_buff *skb, unsigned int *size, unsigned int remaining, + bool *drop_ts, struct mptcp_out_options *opts) { return false;diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index ef0c10cd31c7..53ee4c8f5f8c 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c@@ -1181,12 +1181,16 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb */ if (sk_is_mptcp(sk)) { unsigned int remaining = MAX_TCP_OPTION_SPACE - size; + bool drop_ts = opts->options & OPTION_TS; unsigned int opt_size = 0; if (mptcp_established_options(sk, skb, &opt_size, remaining, - &opts->mptcp)) { + &drop_ts, &opts->mptcp)) { opts->options |= OPTION_MPTCP; size += opt_size; + + if (drop_ts) + opts->options &= ~OPTION_TS; } }Passing local variables' addresses forces the compiler to use a stack canary in this hot function, even for non-MPTCP flows. I was about to test the following patch, which removes the current stack canary caused by MPTCP :/Sorry, I didn't know you were planning to do that. Would that be OK for you if I use an unused bit in opts->mptcp? It's a bit "hackish", but it avoids adding a new local variable address. Or do you have another idea? The modifications in net/ipv4/tcp_output.c would then be limited to:diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index ef0c10cd31c7..f4edc9c4f3fc 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c@@ -1181,12 +1181,18 @@ static unsigned int tcp_established_options(struct sock *sk, struct sk_buff *skb */ if (sk_is_mptcp(sk)) { unsigned int remaining = MAX_TCP_OPTION_SPACE - size; + bool has_ts = opts->options & OPTION_TS; unsigned int opt_size = 0; if (mptcp_established_options(sk, skb, &opt_size, remaining, - &opts->mptcp)) { + has_ts, &opts->mptcp)) { opts->options |= OPTION_MPTCP; size += opt_size; + +#if IS_ENABLED(CONFIG_MPTCP) + if (opts->mptcp.drop_ts) + opts->options &= ~OPTION_TS; +#endifSGTM, but maybe the IS_ENABLED() is not needed in this block, guarded by if (sk_is_mptcp(sk)) ?
I didn't think about that, I will check.
Also I am unsure opts->mptcp.drop_ts is cleared already before reaching tcp_established_options()?
Indeed, it is not. I explicitly reset it in mptcp_established_options, but I could do it here, that would be clearer.
quoted
} } I can also avoid adding a new parameter in mptcp_established_options (bool has_ts) by setting opts->mptcp.drop_ts before calling it, but that's clearer with this new parameter I think.quoted
$ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-92 (-92) Function old new delta tcp_options_write.isra 1423 1407 -16 mptcp_established_options 2746 2720 -26 tcp_established_options 553 503 -50 Total: Before=22110750, After=22110658, chg -0.00%Good reduction! (...)quoted
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index ef0c10cd31c71ff585a937fde37f2b08b1214b5a..594ec6ba02d5413d43842f79aefbf4d8355c4f3f100644--- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c@@ -1183,10 +1183,11 @@ static unsigned inttcp_established_options(struct sock *sk, struct sk_buff *skb unsigned int remaining = MAX_TCP_OPTION_SPACE - size; unsigned int opt_size = 0; - if (mptcp_established_options(sk, skb, &opt_size, remaining, - &opts->mptcp)) { + opt_size = mptcp_established_options(sk, skb, remaining, + &opts->mptcp); + if (opt_size) { opts->options |= OPTION_MPTCP; - size += opt_size; + size += (opt_size & 63);Nice trick! What about returning a negative number when the MPTCP option is not needed? Just to avoid playing with masks in the code?Yes, that would work, thanks. I can provide a patch in a couple of hours.
No hurry, thank you! I will wait for your patches to be applied before sending a v2. (While at it, no need to initialise opt_size to 0 here above, and there was a double ";;" in mptcp_established_options, around "return 64 + total_size;;" but this code will change anyway.) Cheers, Matt -- Sponsored by the NGI0 Core fund.