Re: Performance regressions in TCP_STREAM tests in Linux 4.15 (and later)
From: Eric Dumazet <hidden>
Date: 2018-05-02 22:41:19
On 05/02/2018 02:47 PM, Michael Wenig wrote:
After applying Eric's proposed change (see below) to a 4.17 RC3 kernel, the regressions that we had observed in our TCP_STREAM small message tests with TCP_NODELAY enabled are now drastically reduced. Instead of the original 3x thruput and cpu cost regressions, the regression depth is now < 10% for thruput and between 10% - 20% for cpu cost. The improvements in the TCP_RR tests that we had observed after Eric's original commit are not impacted by the change. It would be great if this change could make it into a patch.
Thanks for a lot testing, I will submit this patch after more tests from my side.
quoted hunk ↗ jump to hunk
Michael Wenig VMware Performance Engineering -----Original Message----- From: Eric Dumazet [mailto:eric.dumazet@gmail.com] Sent: Monday, April 30, 2018 10:48 AM To: Ben Greear <redacted>; Steven Rostedt <rostedt@goodmis.org>; Michael Wenig <redacted> Cc: netdev@vger.kernel.org; Shilpi Agarwal <redacted>; Boon Ang <redacted>; Darren Hart <redacted>; Steven Rostedt <redacted>; Abdul Anshad Azeez <redacted> Subject: Re: Performance regressions in TCP_STREAM tests in Linux 4.15 (and later) On 04/30/2018 09:36 AM, Eric Dumazet wrote:quoted
On 04/30/2018 09:14 AM, Ben Greear wrote:quoted
On 04/27/2018 08:11 PM, Steven Rostedt wrote:quoted
We'd like this email archived in netdev list, but since netdev is notorious for blocking outlook email as spam, it didn't go through. So I'm replying here to help get it into the archives. Thanks! -- Steve On Fri, 27 Apr 2018 23:05:46 +0000 Michael Wenig [off-list ref] wrote:quoted
As part of VMware's performance testing with the Linux 4.15 kernel, we identified CPU cost and throughput regressions when comparing to the Linux 4.14 kernel. The impacted test cases are mostly TCP_STREAM send tests when using small message sizes. The regressions are significant (up 3x) and were tracked down to be a side effect of Eric Dumazat's RB tree changes that went into the Linux 4.15 kernel. Further investigation showed our use of the TCP_NODELAY flag in conjunction with Eric's change caused the regressions to show and simply disabling TCP_NODELAY brought performance back to normal. Eric's change also resulted into significant improvements in our TCP_RR test cases. Based on these results, our theory is that Eric's change made the system overall faster (reduced latency) but as a side effect less aggregation is happening (with TCP_NODELAY) and that results in lower throughput. Previously even though TCP_NODELAY was set, system was slower and we still got some benefit of aggregation. Aggregation helps in better efficiency and higher throughput although it can increase the latency. If you are seeing a regression in your application throughput after this change, using TCP_NODELAY might help bring performance back however that might increase latency.I guess you mean _disabling_ TCP_NODELAY instead of _using_ TCP_NODELAY?Yeah, I guess auto-corking does not work as intended.I would try the following patch :diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 44be7f43455e4aefde8db61e2d941a69abcc642a..c9d00ef54deca15d5760bcbe154001a96fa1e2a7 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c@@ -697,7 +697,7 @@ static bool tcp_should_autocork(struct sock *sk, struct sk_buff *skb, { return skb->len < size_goal && sock_net(sk)->ipv4.sysctl_tcp_autocorking && - skb != tcp_write_queue_head(sk) && + !tcp_rtx_queue_empty(sk) && refcount_read(&sk->sk_wmem_alloc) > skb->truesize; }