Re: [RFC PATCH net-next] tcp: Add net.ipv4.tcp_purge_receive_queue sysctl

From: Eric Dumazet <edumazet@google.com>
Date: 2026-03-03 08:57:11
Also in: linux-doc, lkml

On Tue, Mar 3, 2026 at 9:54 AM Leon Hwang [off-list ref] wrote:



On 3/3/26 16:17, Eric Dumazet wrote:

quoted

On Tue, Mar 3, 2026 at 8:55 AM Leon Hwang [off-list ref] wrote:

quoted



On 3/3/26 14:26, Leon Hwang wrote:

quoted


On 3/3/26 11:55, Eric Dumazet wrote:

quoted

On Tue, Mar 3, 2026 at 3:12 AM Leon Hwang [off-list ref] wrote:

quoted



On 3/3/26 08:22, Jakub Kicinski wrote:

quoted

On Mon, 2 Mar 2026 17:55:59 +0800 Leon Hwang wrote:

quoted

On 26/2/26 09:43, Jakub Kicinski wrote:

quoted

On Wed, 25 Feb 2026 15:46:33 +0800 Leon Hwang wrote:

quoted

Issue:
When a TCP socket in the CLOSE_WAIT state receives a RST packet, the
current implementation does not clear the socket's receive queue. This
causes SKBs in the queue to remain allocated until the socket is
explicitly closed by the application. As a consequence:

1. The page pool pages held by these SKBs are not released.

On what kernel version and driver are you observing this?

# uname -r
6.19.0-061900-generic

# ethtool -i eth0
driver: mlx5_core
version: 6.19.0-061900-generic
firmware-version: 26.43.2566 (MT_0000000531)

Okay... this kernel + driver should just patiently wait for the page
pool to go away.

What is the actual, end user problem that you're trying to solve?
A few kB of data waiting to be freed is not a huge problem..

Yes, it is not a huge problem.

The actual end-user issue was discussed in
"page_pool: Add page_pool_release_stalled tracepoint" [1].

I think it would be useful to provide a way for SREs to purge the
receive queue when CLOSE_WAIT TCP sockets receive RST packets. If the
NIC, e.g., Mellanox, flaps, the underlying page pool and pages can be
released at the same time.

Links:
[1]
https://lore.kernel.org/netdev/b676baa0-2044-4a74-900d-f471620f2896@linux.dev/ (local)

Perhaps SRE could use this in an emergency?

ss -t -a state close-wait -K

This ss command is acceptable in an emergency.

However, once a CLOSE_WAIT TCP socket receives an RST packet, it
transitions to the CLOSE state. A socket in the CLOSE state cannot be
killed using the ss approach.

The SKBs remain in the receive queue of the CLOSE socket until it is
closed by the user-space application.

Why user-space application does not drain the receive queue ?

Is there a missing EPOLLIN or something ?

The user-space application uses a TCP connection pool. It establishes
several TCP connections at startup and keeps them in the pool.

However, the application does not always drain their receive queues.
Instead, it selects one connection from the pool using a hash algorithm
for communication with the TCP server. When it attempts to write data
through a socket in the CLOSE state, it receives -EPIPE and then closes
it. As a result, TCP connections whose underlying socket state is CLOSE
may retain an SKB in their receive queues if they are not selected for
communication.

I proposed a solution to address this issue: close the TCP connection if
the underlying sk_err is non-zero.

Okay, makes sense to fix the root cause. Applications can be fixed in
a matter of hours,
while kernels can stick to hosts for years.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help