Thread (21 messages) 21 messages, 2 authors, 9h ago

[PATCH v3 16/16] ioctl_userfaultfd.2: Add read-write protect mode docs

From: Kiryl Shutsemau <hidden>
Date: 2026-05-22 13:39:56
Also in: kvm, linux-doc, linux-kselftest, linux-mm, lkml
Subsystem: the rest · Maintainer: Linus Torvalds

From: "Kiryl Shutsemau (Meta)" <kas@kernel.org>

Userfaultfd read-write protection (UFFDIO_REGISTER_MODE_RWP) is
supported starting from Linux 7.2. It traps every access -- read or
write -- to a present page within a registered range. The new UAPI
documented here:

  - UFFD_FEATURE_RWP / UFFD_FEATURE_RWP_ASYNC  capability bits
  - UFFDIO_REGISTER_MODE_RWP                   registration-mode bit
  - 1 << _UFFDIO_RWPROTECT / _UFFDIO_SET_MODE  available-ioctls bits
  - UFFDIO_RWPROTECT                           install / remove RWP
  - UFFDIO_SET_MODE                            runtime sync/async toggle

Signed-off-by: Kiryl Shutsemau <kas@kernel.org>
---
 man2/ioctl_userfaultfd.2 | 209 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 208 insertions(+), 1 deletion(-)
diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
index 504f61d4b0cd..0a24a77ca32b 100644
--- a/man2/ioctl_userfaultfd.2
+++ b/man2/ioctl_userfaultfd.2
@@ -25,7 +25,7 @@
 .\" %%%LICENSE_END
 .\"
 .\"
-.TH IOCTL_USERFAULTFD 2 2021-03-22 "Linux" "Linux Programmer's Manual"
+.TH IOCTL_USERFAULTFD 2 2026-05-22 "Linux" "Linux Programmer's Manual"
 .SH NAME
 ioctl_userfaultfd \- create a file descriptor for handling page faults in user
 space
@@ -214,6 +214,33 @@ memory accesses to the regions registered with userfaultfd.
 If this feature bit is set,
 .I uffd_msg.pagefault.feat.ptid
 will be set to the faulted thread ID for each page-fault message.
+.TP
+.BR UFFD_FEATURE_RWP " (since Linux 7.2)"
+If this feature bit is set,
+the kernel supports read-write protection tracking, and the
+.B UFFDIO_REGISTER_MODE_RWP
+registration mode and the
+.B UFFDIO_RWPROTECT
+ioctl described below become available.
+On kernels or architectures that cannot support this mode, the bit is
+masked out from
+.I uffdio_api.features
+on return; callers should inspect the returned features and fall back
+to another tracking mechanism when the bit is absent.
+.TP
+.BR UFFD_FEATURE_RWP_ASYNC " (since Linux 7.2)"
+If this feature bit is set,
+the kernel will resolve read-write protect faults in place without
+delivering a notification, automatically restoring page permissions and
+letting the faulted thread continue.
+This bit requires
+.B UFFD_FEATURE_RWP
+to be set in the same
+.B UFFDIO_API
+call.
+The async mode can also be toggled at runtime using the
+.B UFFDIO_SET_MODE
+ioctl described below.
 .PP
 The returned
 .I ioctls
@@ -240,6 +267,21 @@ operation is supported.
 The
 .B UFFDIO_WRITEPROTECT
 operation is supported.
+.TP
+.BR "1 << _UFFDIO_RWPROTECT" " (since Linux 7.2)"
+The
+.B UFFDIO_RWPROTECT
+operation is supported.
+This bit is reported only when
+.B UFFD_FEATURE_RWP
+was negotiated successfully.
+.TP
+.BR "1 << _UFFDIO_SET_MODE" " (since Linux 7.2)"
+The
+.B UFFDIO_SET_MODE
+operation is supported.
+This is a file-descriptor-level ioctl and is reported once per
+userfaultfd, independent of any registered range.
 .PP
 This
 .BR ioctl (2)
@@ -327,6 +369,16 @@ Track page faults on missing pages.
 .TP
 .B UFFDIO_REGISTER_MODE_WP
 Track page faults on write-protected pages.
+.TP
+.BR UFFDIO_REGISTER_MODE_RWP " (since Linux 7.2)"
+Track page faults on read-write-protected pages.
+Every access (read or write) to a present page within the registered
+range generates a notification once the range has been protected with
+.BR UFFDIO_RWPROTECT .
+This mode cannot be combined with
+.BR UFFDIO_REGISTER_MODE_WP ;
+attempting to do so returns
+.BR EINVAL .
 .PP
 If the operation is successful, the kernel modifies the
 .I ioctls
@@ -735,6 +787,161 @@ or not registered with userfaultfd write-protect mode.
 .TP
 .B EFAULT
 Encountered a generic fault during processing.
+.SS UFFDIO_RWPROTECT (Since Linux 7.2)
+Read-write-protect or un-protect a userfaultfd-registered memory range
+registered with mode
+.BR UFFDIO_REGISTER_MODE_RWP .
+.PP
+The
+.I argp
+argument is a pointer to a
+.I uffdio_rwprotect
+structure as shown below:
+.PP
+.in +4n
+.EX
+struct uffdio_rwprotect {
+    struct uffdio_range range; /* Range to change RWP on */
+    __u64 mode;                /* Mode flags */
+};
+.EE
+.in
+.PP
+The following mode bits are supported:
+.TP
+.B UFFDIO_RWPROTECT_MODE_RWP
+When this mode bit is set,
+the ioctl installs read-write protection on every present page in the
+range specified by
+.IR range .
+Otherwise the ioctl removes read-write protection from the range, which
+is also how a faulted handler resolves an
+.B UFFD_PAGEFAULT_FLAG_RWP
+notification.
+.TP
+.B UFFDIO_RWPROTECT_MODE_DONTWAKE
+When this mode bit is set,
+do not wake up any thread that waits for page-fault resolution after
+the operation.
+This can be specified only if
+.B UFFDIO_RWPROTECT_MODE_RWP
+is not specified.
+.PP
+Read-write protection only affects pages that are currently populated
+in the range; unmapped addresses are left untouched.
+Protection is preserved across page reclaim and migration; callers must
+re-arm a range with
+.B UFFDIO_RWPROTECT
+after any operation that drops the underlying page
+.RB ( "MADV_DONTNEED " "on anonymous memory, hole-punch on shmem,"
+truncation of a file mapping).
+.PP
+This
+.BR ioctl (2)
+operation returns 0 on success.
+On error, \-1 is returned and
+.I errno
+is set to indicate the error.
+Possible errors include:
+.TP
+.B EINVAL
+The
+.I start
+or the
+.I len
+field of the
+.I uffdio_range
+structure was not a multiple of the system page size; or
+.I len
+was zero; or the specified range was otherwise invalid; or an invalid
+mode bit was specified; or
+.B UFFDIO_RWPROTECT_MODE_DONTWAKE
+was specified together with
+.BR UFFDIO_RWPROTECT_MODE_RWP .
+.TP
+.B EAGAIN
+The process was interrupted; retry this call.
+.TP
+.B ENOENT
+The range specified in
+.I range
+is not valid.
+For example, the virtual address does not exist,
+or part of the range is not registered with
+.BR UFFDIO_REGISTER_MODE_RWP .
+.TP
+.B EFAULT
+Encountered a generic fault during processing.
+.\"
+.SS UFFDIO_SET_MODE (Since Linux 7.2)
+Toggle userfaultfd features that may be flipped at runtime.
+.PP
+The
+.I argp
+argument is a pointer to a
+.I uffdio_set_mode
+structure as shown below:
+.PP
+.in +4n
+.EX
+struct uffdio_set_mode {
+    __u64 enable;     /* Feature bits to set */
+    __u64 disable;    /* Feature bits to clear */
+};
+.EE
+.in
+.PP
+Bits set in
+.I enable
+turn the named features on; bits set in
+.I disable
+turn them off.
+The two fields must not overlap.
+Today only
+.B UFFD_FEATURE_RWP_ASYNC
+is a valid bit in either field; any other bit causes the ioctl to
+return
+.BR EINVAL .
+Enabling
+.B UFFD_FEATURE_RWP_ASYNC
+also requires
+.B UFFD_FEATURE_RWP
+to have been negotiated at
+.B UFFDIO_API
+time.
+.PP
+The toggle takes the per-process
+.I mmap_lock
+in write mode, ensuring that all in-flight fault handlers complete
+before the new mode takes effect.
+This allows a single userfaultfd to switch between lightweight async
+detection and synchronous eviction without re-registering its ranges.
+.PP
+This
+.BR ioctl (2)
+operation returns 0 on success.
+On error, \-1 is returned and
+.I errno
+is set to indicate the error.
+Possible errors include:
+.TP
+.B EINVAL
+A bit other than
+.B UFFD_FEATURE_RWP_ASYNC
+was specified in
+.I enable
+or
+.IR disable ;
+the two fields overlap; or
+.B UFFD_FEATURE_RWP_ASYNC
+was requested without
+.B UFFD_FEATURE_RWP
+having been negotiated.
+.TP
+.B EFAULT
+.I argp
+refers to an address that is outside the calling process's accessible
+address space.
 .SH RETURN VALUE
 See descriptions of the individual operations, above.
 .SH ERRORS
-- 
2.51.2
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help