Thread (43 messages) 43 messages, 6 authors, 2025-03-14

Re: [PATCH] tcp: check socket state before calling WARN_ON

From: Eric Dumazet <edumazet@google.com>
Date: 2025-02-25 18:43:33
Also in: lkml

On Tue, Feb 25, 2025 at 7:28 PM Yuchung Cheng [off-list ref] wrote:
On Tue, Feb 25, 2025 at 9:25 AM Neal Cardwell [off-list ref] wrote:
quoted
On Mon, Feb 24, 2025 at 4:13 PM Neal Cardwell [off-list ref] wrote:
quoted
On Mon, Feb 3, 2025 at 12:17 AM Youngmin Nam [off-list ref] wrote:
quoted
quoted
Hi Neal,
Thank you for looking into this issue.
When we first encountered this issue, we also suspected that tcp_write_queue_purge() was being called.
We can provide any information you would like to inspect.
Thanks again for raising this issue, and providing all that data!

I've come up with a reproducer for this issue, and an explanation for
why this has only been seen on Android so far, and a theory about a
related socket leak issue, and a proposed fix for the WARN and the
socket leak.

Here is the scenario:

+ user process A has a socket in TCP_ESTABLISHED

+ user process A calls close(fd)

+ socket calls __tcp_close() and tcp_close_state() decides to enter
TCP_FIN_WAIT1 and send a FIN

+ FIN is lost and retransmitted, making the state:
---
 tp->packets_out = 1
 tp->sacked_out = 0
 tp->lost_out = 1
 tp->retrans_out = 1
---

+ someone invokes "ss" to --kill the socket using the functionality in
(1e64e298b8 "net: diag: Support destroying TCP sockets")

  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c1e64e298b8cad309091b95d8436a0255c84f54a

 (note: this was added for Android, so would not be surprising to have
this inet_diag --kill run on Android)

+ the ss --kill causes a call to tcp_abort()

+ tcp_abort() calls tcp_write_queue_purge()

+ tcp_write_queue_purge() sets packets_out=0 but leaves lost_out=1,
retrans_out=1

+ tcp_sock still exists in TCP_FIN_WAIT1 but now with an inconsistent state

+ ACK arrives and causes a WARN_ON from tcp_verify_left_out():

#define tcp_verify_left_out(tp) WARN_ON(tcp_left_out(tp) > tp->packets_out)

because the state has:

 ---
 tcp_left_out(tp) = sacked_out + lost_out = 1
  tp->packets_out = 0
---

because the state is:

---
 tp->packets_out = 0
 tp->sacked_out = 0
 tp->lost_out = 1
 tp->retrans_out = 1
---

I guess perhaps one fix would be to just have tcp_write_queue_purge()
zero out those other fields:

---
 tp->sacked_out = 0
 tp->lost_out = 0
 tp->retrans_out = 0
---

However, there is a related and worse problem. Because this killed
socket has tp->packets_out, the next time the RTO timer fires,
Zeroing all inflights stats in tcp_write_queue_purge still makes sense
to me. Why will the RTO timer still fire if packets_out is zeroed?
By definition, tcp_write_queue_purge() must only happen when the
socket reaches a final state.

No further transmit is possible, since this broke a  major TCP
principle (stream mode, no sendmsg() can be zapped)

tcp_write_timer_handler() immediately returns if the final state is reached.

if (((1 << sk->sk_state) & (TCPF_CLOSE | TCPF_LISTEN)) ..
    return;

Also look at INET_CSK_CLEAR_TIMERS, if you want to know why the
retransmit timer can fire.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help