Re: [RFC PATCH net-next] tcp: Add net.ipv4.tcp_purge_receive_queue sysctl
From: Eric Dumazet <edumazet@google.com>
Date: 2026-03-03 08:57:11
Also in:
linux-doc, lkml
On Tue, Mar 3, 2026 at 9:54 AM Leon Hwang [off-list ref] wrote:
On 3/3/26 16:17, Eric Dumazet wrote:quoted
On Tue, Mar 3, 2026 at 8:55 AM Leon Hwang [off-list ref] wrote:quoted
On 3/3/26 14:26, Leon Hwang wrote:quoted
On 3/3/26 11:55, Eric Dumazet wrote:quoted
On Tue, Mar 3, 2026 at 3:12 AM Leon Hwang [off-list ref] wrote:quoted
On 3/3/26 08:22, Jakub Kicinski wrote:quoted
On Mon, 2 Mar 2026 17:55:59 +0800 Leon Hwang wrote:quoted
On 26/2/26 09:43, Jakub Kicinski wrote:quoted
On Wed, 25 Feb 2026 15:46:33 +0800 Leon Hwang wrote:quoted
Issue: When a TCP socket in the CLOSE_WAIT state receives a RST packet, the current implementation does not clear the socket's receive queue. This causes SKBs in the queue to remain allocated until the socket is explicitly closed by the application. As a consequence: 1. The page pool pages held by these SKBs are not released.On what kernel version and driver are you observing this?# uname -r 6.19.0-061900-generic # ethtool -i eth0 driver: mlx5_core version: 6.19.0-061900-generic firmware-version: 26.43.2566 (MT_0000000531)Okay... this kernel + driver should just patiently wait for the page pool to go away. What is the actual, end user problem that you're trying to solve? A few kB of data waiting to be freed is not a huge problem..Yes, it is not a huge problem. The actual end-user issue was discussed in "page_pool: Add page_pool_release_stalled tracepoint" [1]. I think it would be useful to provide a way for SREs to purge the receive queue when CLOSE_WAIT TCP sockets receive RST packets. If the NIC, e.g., Mellanox, flaps, the underlying page pool and pages can be released at the same time. Links: [1] https://lore.kernel.org/netdev/b676baa0-2044-4a74-900d-f471620f2896@linux.dev/ (local)Perhaps SRE could use this in an emergency? ss -t -a state close-wait -KThis ss command is acceptable in an emergency.However, once a CLOSE_WAIT TCP socket receives an RST packet, it transitions to the CLOSE state. A socket in the CLOSE state cannot be killed using the ss approach. The SKBs remain in the receive queue of the CLOSE socket until it is closed by the user-space application.Why user-space application does not drain the receive queue ? Is there a missing EPOLLIN or something ?The user-space application uses a TCP connection pool. It establishes several TCP connections at startup and keeps them in the pool. However, the application does not always drain their receive queues. Instead, it selects one connection from the pool using a hash algorithm for communication with the TCP server. When it attempts to write data through a socket in the CLOSE state, it receives -EPIPE and then closes it. As a result, TCP connections whose underlying socket state is CLOSE may retain an SKB in their receive queues if they are not selected for communication. I proposed a solution to address this issue: close the TCP connection if the underlying sk_err is non-zero.
Okay, makes sense to fix the root cause. Applications can be fixed in a matter of hours, while kernels can stick to hosts for years.