[PATCH v3 16/16] ioctl_userfaultfd.2: Add read-write protect mode docs
From: Kiryl Shutsemau <hidden>
Date: 2026-05-22 13:39:56
Also in:
kvm, linux-doc, linux-kselftest, linux-mm, lkml
Subsystem:
the rest · Maintainer:
Linus Torvalds
From: "Kiryl Shutsemau (Meta)" <kas@kernel.org> Userfaultfd read-write protection (UFFDIO_REGISTER_MODE_RWP) is supported starting from Linux 7.2. It traps every access -- read or write -- to a present page within a registered range. The new UAPI documented here: - UFFD_FEATURE_RWP / UFFD_FEATURE_RWP_ASYNC capability bits - UFFDIO_REGISTER_MODE_RWP registration-mode bit - 1 << _UFFDIO_RWPROTECT / _UFFDIO_SET_MODE available-ioctls bits - UFFDIO_RWPROTECT install / remove RWP - UFFDIO_SET_MODE runtime sync/async toggle Signed-off-by: Kiryl Shutsemau <kas@kernel.org> --- man2/ioctl_userfaultfd.2 | 209 ++++++++++++++++++++++++++++++++++++++- 1 file changed, 208 insertions(+), 1 deletion(-)
diff --git a/man2/ioctl_userfaultfd.2 b/man2/ioctl_userfaultfd.2
index 504f61d4b0cd..0a24a77ca32b 100644
--- a/man2/ioctl_userfaultfd.2
+++ b/man2/ioctl_userfaultfd.2@@ -25,7 +25,7 @@ .\" %%%LICENSE_END .\" .\" -.TH IOCTL_USERFAULTFD 2 2021-03-22 "Linux" "Linux Programmer's Manual" +.TH IOCTL_USERFAULTFD 2 2026-05-22 "Linux" "Linux Programmer's Manual" .SH NAME ioctl_userfaultfd \- create a file descriptor for handling page faults in user space
@@ -214,6 +214,33 @@ memory accesses to the regions registered with userfaultfd. If this feature bit is set, .I uffd_msg.pagefault.feat.ptid will be set to the faulted thread ID for each page-fault message. +.TP +.BR UFFD_FEATURE_RWP " (since Linux 7.2)" +If this feature bit is set, +the kernel supports read-write protection tracking, and the +.B UFFDIO_REGISTER_MODE_RWP +registration mode and the +.B UFFDIO_RWPROTECT +ioctl described below become available. +On kernels or architectures that cannot support this mode, the bit is +masked out from +.I uffdio_api.features +on return; callers should inspect the returned features and fall back +to another tracking mechanism when the bit is absent. +.TP +.BR UFFD_FEATURE_RWP_ASYNC " (since Linux 7.2)" +If this feature bit is set, +the kernel will resolve read-write protect faults in place without +delivering a notification, automatically restoring page permissions and +letting the faulted thread continue. +This bit requires +.B UFFD_FEATURE_RWP +to be set in the same +.B UFFDIO_API +call. +The async mode can also be toggled at runtime using the +.B UFFDIO_SET_MODE +ioctl described below. .PP The returned .I ioctls
@@ -240,6 +267,21 @@ operation is supported. The .B UFFDIO_WRITEPROTECT operation is supported. +.TP +.BR "1 << _UFFDIO_RWPROTECT" " (since Linux 7.2)" +The +.B UFFDIO_RWPROTECT +operation is supported. +This bit is reported only when +.B UFFD_FEATURE_RWP +was negotiated successfully. +.TP +.BR "1 << _UFFDIO_SET_MODE" " (since Linux 7.2)" +The +.B UFFDIO_SET_MODE +operation is supported. +This is a file-descriptor-level ioctl and is reported once per +userfaultfd, independent of any registered range. .PP This .BR ioctl (2)
@@ -327,6 +369,16 @@ Track page faults on missing pages. .TP .B UFFDIO_REGISTER_MODE_WP Track page faults on write-protected pages. +.TP +.BR UFFDIO_REGISTER_MODE_RWP " (since Linux 7.2)" +Track page faults on read-write-protected pages. +Every access (read or write) to a present page within the registered +range generates a notification once the range has been protected with +.BR UFFDIO_RWPROTECT . +This mode cannot be combined with +.BR UFFDIO_REGISTER_MODE_WP ; +attempting to do so returns +.BR EINVAL . .PP If the operation is successful, the kernel modifies the .I ioctls
@@ -735,6 +787,161 @@ or not registered with userfaultfd write-protect mode. .TP .B EFAULT Encountered a generic fault during processing. +.SS UFFDIO_RWPROTECT (Since Linux 7.2) +Read-write-protect or un-protect a userfaultfd-registered memory range +registered with mode +.BR UFFDIO_REGISTER_MODE_RWP . +.PP +The +.I argp +argument is a pointer to a +.I uffdio_rwprotect +structure as shown below: +.PP +.in +4n +.EX +struct uffdio_rwprotect { + struct uffdio_range range; /* Range to change RWP on */ + __u64 mode; /* Mode flags */ +}; +.EE +.in +.PP +The following mode bits are supported: +.TP +.B UFFDIO_RWPROTECT_MODE_RWP +When this mode bit is set, +the ioctl installs read-write protection on every present page in the +range specified by +.IR range . +Otherwise the ioctl removes read-write protection from the range, which +is also how a faulted handler resolves an +.B UFFD_PAGEFAULT_FLAG_RWP +notification. +.TP +.B UFFDIO_RWPROTECT_MODE_DONTWAKE +When this mode bit is set, +do not wake up any thread that waits for page-fault resolution after +the operation. +This can be specified only if +.B UFFDIO_RWPROTECT_MODE_RWP +is not specified. +.PP +Read-write protection only affects pages that are currently populated +in the range; unmapped addresses are left untouched. +Protection is preserved across page reclaim and migration; callers must +re-arm a range with +.B UFFDIO_RWPROTECT +after any operation that drops the underlying page +.RB ( "MADV_DONTNEED " "on anonymous memory, hole-punch on shmem," +truncation of a file mapping). +.PP +This +.BR ioctl (2) +operation returns 0 on success. +On error, \-1 is returned and +.I errno +is set to indicate the error. +Possible errors include: +.TP +.B EINVAL +The +.I start +or the +.I len +field of the +.I uffdio_range +structure was not a multiple of the system page size; or +.I len +was zero; or the specified range was otherwise invalid; or an invalid +mode bit was specified; or +.B UFFDIO_RWPROTECT_MODE_DONTWAKE +was specified together with +.BR UFFDIO_RWPROTECT_MODE_RWP . +.TP +.B EAGAIN +The process was interrupted; retry this call. +.TP +.B ENOENT +The range specified in +.I range +is not valid. +For example, the virtual address does not exist, +or part of the range is not registered with +.BR UFFDIO_REGISTER_MODE_RWP . +.TP +.B EFAULT +Encountered a generic fault during processing. +.\" +.SS UFFDIO_SET_MODE (Since Linux 7.2) +Toggle userfaultfd features that may be flipped at runtime. +.PP +The +.I argp +argument is a pointer to a +.I uffdio_set_mode +structure as shown below: +.PP +.in +4n +.EX +struct uffdio_set_mode { + __u64 enable; /* Feature bits to set */ + __u64 disable; /* Feature bits to clear */ +}; +.EE +.in +.PP +Bits set in +.I enable +turn the named features on; bits set in +.I disable +turn them off. +The two fields must not overlap. +Today only +.B UFFD_FEATURE_RWP_ASYNC +is a valid bit in either field; any other bit causes the ioctl to +return +.BR EINVAL . +Enabling +.B UFFD_FEATURE_RWP_ASYNC +also requires +.B UFFD_FEATURE_RWP +to have been negotiated at +.B UFFDIO_API +time. +.PP +The toggle takes the per-process +.I mmap_lock +in write mode, ensuring that all in-flight fault handlers complete +before the new mode takes effect. +This allows a single userfaultfd to switch between lightweight async +detection and synchronous eviction without re-registering its ranges. +.PP +This +.BR ioctl (2) +operation returns 0 on success. +On error, \-1 is returned and +.I errno +is set to indicate the error. +Possible errors include: +.TP +.B EINVAL +A bit other than +.B UFFD_FEATURE_RWP_ASYNC +was specified in +.I enable +or +.IR disable ; +the two fields overlap; or +.B UFFD_FEATURE_RWP_ASYNC +was requested without +.B UFFD_FEATURE_RWP +having been negotiated. +.TP +.B EFAULT +.I argp +refers to an address that is outside the calling process's accessible +address space. .SH RETURN VALUE See descriptions of the individual operations, above. .SH ERRORS
--
2.51.2