Thread (21 messages) 21 messages, 6 authors, 2025-05-23

Re: [PATCH v1 net-next 4/6] socket: Remove kernel socket conversion except for net/rds/.

From: Chuck Lever <chuck.lever@oracle.com>
Date: 2025-05-22 16:40:27
Also in: linux-nfs, linux-nvme, linux-rdma, mptcp

On 5/22/25 4:55 AM, Paolo Abeni wrote:
On 5/17/25 5:50 AM, Kuniyuki Iwashima wrote:
quoted
Since commit 26abe14379f8 ("net: Modify sk_alloc to not reference
count the netns of kernel sockets."), TCP kernel socket has caused
many UAF.

We have converted such sockets to hold netns refcnt, and we have
the same pattern in cifs, mptcp, nvme, rds, smc, and sunrpc.

  __sock_create_kern(..., &sock);
  sk_net_refcnt_upgrade(sock->sk);

Let's drop the conversion and use sock_create_kern() instead.

The changes for cifs, mptcp, nvme, and smc are straightforward.

For sunrpc, we call sock_create_net() for IPPROTO_TCP only and still
call __sock_create_kern() for others.

For rds, we cannot drop sk_net_refcnt_upgrade() for accept()ed
sockets.

Signed-off-by: Kuniyuki Iwashima <redacted>
This LGTM, but is touching a few other subsystems, it would be great to
collect acks from the relevant maintainers: I'm adding a few CCs.

Direct link to the series:

https://lore.kernel.org/all/20250517035120.55560-1-kuniyu@amazon.com/#t (local)
Thank you, Paolo, for forwarding this series.

For all hunks modifying net/sunrpc/svcsock.c and
net/handshake/handshake-test.c:

  Acked-by: Chuck Lever [off-list ref]

Regarding patch 4/6:

This paragraph in the patch description needs to explain /why/ sunrpc
is an exception:
For sunrpc, we call sock_create_net() for IPPROTO_TCP only and still
call __sock_create_kern() for others.
The below hunk doesn't seem related to the marquee purpose of this
series. Should it be a separate patch with its own rationale?
@@ -1541,8 +1544,8 @@ static struct svc_xprt *svc_create_socket(struct
svc_serv *serv,
 	newlen = error;

 	if (protocol == IPPROTO_TCP) {
-		sk_net_refcnt_upgrade(sock->sk);
-		if ((error = kernel_listen(sock, 64)) < 0)
+		error = kernel_listen(sock, 64);
+		if (error < 0)
 			goto bummer;
 	}

quoted
diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
index 37a2ba38f10e..c7b4f5a7cca1 100644
--- a/fs/smb/client/connect.c
+++ b/fs/smb/client/connect.c
@@ -3348,21 +3348,14 @@ generic_ip_connect(struct TCP_Server_Info *server)
 		socket = server->ssocket;
 	} else {
 		struct net *net = cifs_net_ns(server);
-		struct sock *sk;
 
-		rc = __sock_create_kern(net, sfamily, SOCK_STREAM,
-					IPPROTO_TCP, &server->ssocket);
+		rc = sock_create_kern(net, sfamily, SOCK_STREAM,
+				      IPPROTO_TCP, &server->ssocket);
 		if (rc < 0) {
 			cifs_server_dbg(VFS, "Error %d creating socket\n", rc);
 			return rc;
 		}
 
-		sk = server->ssocket->sk;
-		__netns_tracker_free(net, &sk->ns_tracker, false);
-		sk->sk_net_refcnt = 1;
-		get_net_track(net, &sk->ns_tracker, GFP_KERNEL);
-		sock_inuse_add(net, 1);
AFAICS the above implicitly adds a missing net_passive_dec(net), which
in turns looks like a separate bugfix. What about adding a separate
patch introducing that line? Could be in the same series to simplify the
processing.

Thanks,

Paolo

-- 
Chuck Lever
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help