Thread (15 messages) 15 messages, 6 authors, 2022-08-18

Re: [PATCH net v2] Revert "tcp: change pingpong threshold to 3"

From: Neal Cardwell <ncardwell@google.com>
Date: 2022-08-15 21:22:05
Also in: stable

=




On Mon, Aug 15, 2022 at 2:54 PM Wei Wang [off-list ref] wrote:
On Mon, Aug 15, 2022 at 6:30 AM Neal Cardwell [off-list ref] wrote:
quoted
On Mon, Aug 15, 2022 at 3:48 AM Jiri Slaby [off-list ref] wrote:
quoted
On 06. 08. 22, 16:41, Jiri Slaby wrote:
quoted
On 06. 08. 22, 13:24, Neal Cardwell wrote:
quoted
On Sat, Aug 6, 2022 at 6:02 AM Jiri Slaby [off-list ref] wrote:
quoted
On 21. 07. 22, 22:44, Wei Wang wrote:
quoted
This reverts commit 4a41f453bedfd5e9cd040bad509d9da49feb3e2c.

This to-be-reverted commit was meant to apply a stricter rule for the
stack to enter pingpong mode. However, the condition used to check for
interactive session "before(tp->lsndtime, icsk->icsk_ack.lrcvtime)" is
jiffy based and might be too coarse, which delays the stack entering
pingpong mode.
We revert this patch so that we no longer use the above condition to
determine interactive session, and also reduce pingpong threshold to 1.

Fixes: 4a41f453bedf ("tcp: change pingpong threshold to 3")
Reported-by: LemmyHuang <redacted>
Suggested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Wei Wang <redacted>

This breaks python-eventlet [1] (and was backported to stable trees):
________________ TestHttpd.test_018b_http_10_keepalive_framing
_________________

self = <tests.wsgi_test.TestHttpd
testMethod=test_018b_http_10_keepalive_framing>

      def test_018b_http_10_keepalive_framing(self):
          # verify that if an http/1.0 client sends connection:
keep-alive
          # that we don't mangle the request framing if the app doesn't
read the request
          def app(environ, start_response):
              resp_body = {
                  '/1': b'first response',
                  '/2': b'second response',
                  '/3': b'third response',
              }.get(environ['PATH_INFO'])
              if resp_body is None:
                  resp_body = 'Unexpected path: ' + environ['PATH_INFO']
                  if six.PY3:
                      resp_body = resp_body.encode('latin1')
              # Never look at wsgi.input!
              start_response('200 OK', [('Content-type', 'text/plain')])
              return [resp_body]

          self.site.application = app
          sock = eventlet.connect(self.server_addr)
          req_body = b'GET /tricksy HTTP/1.1\r\n'
          body_len = str(len(req_body)).encode('ascii')

          sock.sendall(b'PUT /1 HTTP/1.0\r\nHost:
localhost\r\nConnection: keep-alive\r\n'
                       b'Content-Length: ' + body_len + b'\r\n\r\n' +
req_body)
          result1 = read_http(sock)
          self.assertEqual(b'first response', result1.body)
          self.assertEqual(result1.headers_original.get('Connection'),
'keep-alive')

          sock.sendall(b'PUT /2 HTTP/1.0\r\nHost:
localhost\r\nConnection: keep-alive\r\n'
                       b'Content-Length: ' + body_len + b'\r\nExpect:
100-continue\r\n\r\n')
          # Client may have a short timeout waiting on that 100 Continue
          # and basically immediately send its body
          sock.sendall(req_body)
          result2 = read_http(sock)
          self.assertEqual(b'second response', result2.body)
          self.assertEqual(result2.headers_original.get('Connection'),
'close')

  >       sock.sendall(b'PUT /3 HTTP/1.0\r\nHost:
localhost\r\nConnection: close\r\n\r\n')

tests/wsgi_test.py:648:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _
eventlet/greenio/base.py:407: in sendall
      tail = self.send(data, flags)
eventlet/greenio/base.py:401: in send
      return self._send_loop(self.fd.send, data, flags)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
_ _ _ _

self = <eventlet.greenio.base.GreenSocket object at 0x7f5f2f73c9a0>
send_method = <built-in method send of socket object at 0x7f5f2f73d520>
data = b'PUT /3 HTTP/1.0\r\nHost: localhost\r\nConnection:
close\r\n\r\n'
args = (0,), _timeout_exc = timeout('timed out'), eno = 32

      def _send_loop(self, send_method, data, *args):
          if self.act_non_blocking:
              return send_method(data, *args)

          _timeout_exc = socket_timeout('timed out')
          while True:
              try:
  >               return send_method(data, *args)
E               BrokenPipeError: [Errno 32] Broken pipe

eventlet/greenio/base.py:388: BrokenPipeError
====================

Reverting this revert on the top of 5.19 solves the issue.

Any ideas?
Interesting. This revert should return the kernel back to the delayed
ACK behavior it had for many years before May 2019 and Linux 5.1,
which contains the commit it is reverting:

   4a41f453bedfd tcp: change pingpong threshold to 3

It sounds like perhaps this test you mention has an implicit
dependence on the timing of delayed ACKs.

A few questions:
Dunno. I am only an openSUSE kernel maintainer and this popped out at
me. Feel free to dig to eventlet's sources on your own :P.
Any updates on this or should I send a revert directly?

The "before() &&" part of the patch makes the difference. That is this diff:
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -172,9 +172,17 @@ static void tcp_event_data_sent(struct tcp_sock *tp,
          * and it is a reply for ato after last received packet,
          * increase pingpong count.
          */
-       if (before(tp->lsndtime, icsk->icsk_ack.lrcvtime) &&
-           (u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato)
+       pr_info("%s: sk=%p (%llx:%x) now=%u lsndtime=%u lrcvtime=%u
ping=%u\n",
+                       __func__, sk, sk->sk_addrpair, sk->sk_portpair, now,
+                       tp->lsndtime, icsk->icsk_ack.lrcvtime,
+                       inet_csk(sk)->icsk_ack.pingpong);
+       if (//before(tp->lsndtime, icsk->icsk_ack.lrcvtime) &&
+           (u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato) {
                 inet_csk_inc_pingpong_cnt(sk);
+               pr_info("\tINC ping=%u before=%u\n",
+                               inet_csk(sk)->icsk_ack.pingpong,
+                               before(tp->lsndtime,
icsk->icsk_ack.lrcvtime));
+       }

         tp->lsndtime = now;
  }

makes it work again, and outputs this:
Is the above patch made on top of my reverted patch? It seems not
according to this part of diff.
Then what is the definition of TCP_PINGPONG_THRESH in the working
case? I think that is the key, regardless of the result of:
    before(tp->lsndtime, icsk->icsk_ack.lrcvtime)

I tried to look into what exactly the test is doing, and can't tell
why it is failing. I don't see any check that is based on the timing
of the reply. :(
I hope someone could explain more about what this test is doing.
Yes, the test case is a bit hard to read.

I have a conjecture about what might be going wrong: the eventlet
client code connect() method in
https://github.com/eventlet/eventlet/blob/master/eventlet/green/http/client.py
uses:
          self.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
to disable Nagle's algorithm.

However, I don't see anything on the server side in
https://github.com/eventlet/eventlet/blob/master/eventlet/green/http/server.py
disabling Nagle's algorithm.

AFAICT the server code is using TCPServer, so from skimming
  https://github.com/python/cpython/blob/3.10/Lib/socketserver.py
...it seems that the code should probably be setting
disable_nagle_algorithm to True to disable Nagle's algorithm
(disable_nagle_algorithm defaults to False).

If the server code is indeed leaving Nagle's algorithm enabled (the
TCP default), as I suspect here, that's a classic
userspace/application bug, which  means the server performance can be
stalled by waiting for delayed ACKs from the remote side, which means
changes in delayed ACK timing behavior (such as from this kernel
patch) can cause the test to fail.

The appropriate fix is probably to have the eventlet server code use
disable_nagle_algorithm=True.

Just a conjecture. :-)

cheers,
neal
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help