Re: [RFC PATCH net-next] tcp: Add net.ipv4.tcp_purge_receive_queue sysctl
From: Leon Hwang <hidden>
Date: 2026-02-25 09:48:25
Also in:
linux-doc, lkml
On 25/2/26 16:31, Eric Dumazet wrote:
On Wed, Feb 25, 2026 at 8:46 AM Leon Hwang [off-list ref] wrote:quoted
Introduce a new sysctl knob, net.ipv4.tcp_purge_receive_queue, to address a memory leak scenario related to TCP sockets.We use the term "memory leak" for a persistent loss of memory (until reboot)
Thanks for the clarification.
Lets not abuse it and confuse various AI/human agents which will declare emergency situations caused by an inexistent fatal error.
I'll reword it in the next revision.
quoted
Issue: When a TCP socket in the CLOSE_WAIT state receives a RST packet, the current implementation does not clear the socket's receive queue. This causes SKBs in the queue to remain allocated until the socket is explicitly closed by the application. As a consequence: 1. The page pool pages held by these SKBs are not released.This situation also applies for normal TCP_ESTABLISHED sockets, when applications do not drain the receive queue. As long the application has not called close(), kernel should not assume the application will _not_ read the data that was received.
Understood. This patch provides an option to drain the receive queue in the CLOSE_WAIT + RST case, instead of purging it unconditionally upon receiving a RST packet.
quoted
2. The associated page pool cannot be freed. RFC 9293 Section 3.10.7.4 specifies that when a RST is received in CLOSE_WAIT state, "all segment queues should be flushed." However, the current implementation does not flush the receive queue.Some buggy stacks send RST anyway after FIN. I think that forcingly purging good data received before the RST would add many surprises.
Understood. There is a tcp_write_queue_purge(sk) call in tcp_done_with_error(), which means sk_write_queue is always purged when a RST packet is received. I assume the reason for purging sk_write_queue is that any pending transmissions become meaningless once a RST is received. Would it be better to defer kb_queue_purge(&sk->sk_receive_queue) until after tcp_done_with_error()? [...]
quoted
Please prepare a packetdrill test.
Ack. I'll add a packetdrill test in the next revision. Thanks, Leon