Thread (9 messages) 9 messages, 5 authors, 8d ago

Re: [PATCH net-next v3 2/2] rds: convert to getsockopt_iter: manual merge

From: Matthieu Baerts <matttbe@kernel.org>
Date: 2026-06-11 12:52:47
Also in: linux-kselftest, linux-next, linux-rdma, lkml

Hi Breno, Allison,

On 08/06/2026 11:44, Breno Leitao wrote:
Convert RDS socket's getsockopt implementation to use the new
getsockopt_iter callback with sockopt_t.

Key changes:
- Replace (char __user *optval, int __user *optlen) with sockopt_t *opt
- Use opt->optlen for buffer length (input) and returned size (output)
- Use copy_to_iter() instead of put_user()/copy_to_user()

The RDS_INFO_* snapshot path in rds_info_getsockopt() used to pin the
userspace buffer with pin_user_pages_fast() on the raw optval address;
the info producers then memcpy into those pages under a spinlock via
kmap_atomic() and so must not fault. Obtain the same page array and
starting offset from opt->iter_out with iov_iter_extract_pages(), which
pins for write because iter_out is ITER_DEST.

The page array is preallocated here (sized with iov_iter_npages()) and
passed in, so iov_iter_extract_pages() fills it in place rather than
allocating one for us; RDS therefore keeps ownership of the array on
every return path and frees it itself. The rds_info_iterator /
rds_info_copy machinery and all producer callbacks are unchanged.

Kernel buffers (ITER_KVEC) are not page-backed in a way the info
producers can use, so the RDS_INFO path returns -EOPNOTSUPP for them;
this matches the previous behaviour, where a kernel-buffer getsockopt
hit the WARN_ONCE() path in do_sock_getsockopt() and returned
-EOPNOTSUPP. The simple RDS_RECVERR and SO_RDS_TRANSPORT options keep
working for kernel buffers via copy_to_iter().
(...)
quoted hunk ↗ jump to hunk
diff --git a/net/rds/info.c b/net/rds/info.c
index f1b29994934a..499b3774860e 100644
--- a/net/rds/info.c
+++ b/net/rds/info.c
(...)
quoted hunk ↗ jump to hunk
@@ -230,13 +239,16 @@ int rds_info_getsockopt(struct socket *sock, int optname, char __user *optval,
 		ret = lens.each;
 	}
 
-	if (put_user(len, optlen))
-		ret = -EFAULT;
+	opt->optlen = len;
 
 out:
-	if (pages)
+	/*
+	 * iov_iter_extract_pages() pins only user-backed (ubuf) iters;
+	 * iov_iter_extract_will_pin() reports whether an unpin is owed here.
+	 */
+	if (pages && iov_iter_extract_will_pin(&opt->iter_out))
 		unpin_user_pages(pages, nr_pages);
FYI, we got a small conflict when merging 'net' in 'net-next' in the
MPTCP tree due to this patch applied in 'net':

  f512db8267b73 ("rds: mark snapshot pages dirty in rds_info_getsockopt()")

and this one from 'net-next':

  6e94eeb2a2a6 ("rds: convert to getsockopt_iter")

----- Generic Message -----
The best is to avoid conflicts between 'net' and 'net-next' trees but if
they cannot be avoided when preparing patches, a note about how to fix
them is much appreciated.

The conflict has been resolved on our side [1] and the resolution we
suggest is attached to this email. Please report any issues linked to
this conflict resolution as it might be used by others. If you worked on
the mentioned patches, don't hesitate to ACK this conflict resolution.
---------------------------

Regarding this conflict, I took the modification from net-next, but
using unpin_user_pages_dirty_lock() from net.

Rerere cache is available in [2].

Cheers,
Matt

1: https://github.com/multipath-tcp/mptcp_net-next/commit/a8d41e018cc6
2: https://github.com/multipath-tcp/mptcp-upstream-rr-cache/commit/88eeb

Cheers,
Matt
-- 
Sponsored by the NGI0 Core fund.

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help