Re: [PATCH] PR_*ET_THP_DISABLE.2const: document addition of PR_THP_DISABLE_EXCEPT_ADVISED
From: Usama Arif <hidden>
Date: 2025-09-01 16:58:45
Also in:
lkml
On 01/09/2025 17:36, Alejandro Colomar wrote:
Hi Usama, On Mon, Sep 01, 2025 at 05:09:03PM +0100, Usama Arif wrote:quoted
PR_THP_DISABLE_EXCEPT_ADVISED extended PR_SET_THP_DISABLE to only provide THPs when advised. IOW, it allows individual processes to opt-out of THP = "always" into THP = "madvise", without affecting other workloads on the system. The series has been merged in [1]. This patch documents the changes introduced due to the addition of PR_THP_DISABLE_EXCEPT_ADVISED flag: - PR_GET_THP_DISABLE returns a value whose bits indicate how THP-disable is configured for the calling thread (with or without PR_THP_DISABLE_EXCEPT_ADVISED). - PR_SET_THP_DISABLE now uses arg3 to specify whether to disable THP completely for the process, or disable except madvise (PR_THP_DISABLE_EXCEPT_ADVISED). [1] https://lore.kernel.org/all/20250815135549.130506-1-usamaarif642@gmail.com/ (local) Signed-off-by: Usama Arif <redacted>Thanks for writing the patch! Please see some comments below.
Thanks for the quick review! Its my first time writing a man page so was apologies if there were some basic mistakes in formatting.
quoted
--- man/man2/madvise.2 | 4 +- man/man2const/PR_GET_THP_DISABLE.2const | 18 ++++++--- man/man2const/PR_SET_THP_DISABLE.2const | 52 +++++++++++++++++++++---- 3 files changed, 61 insertions(+), 13 deletions(-)diff --git a/man/man2/madvise.2 b/man/man2/madvise.2 index 10cc21fa4..6a5290f67 100644 --- a/man/man2/madvise.2 +++ b/man/man2/madvise.2@@ -373,7 +373,9 @@ nor can it be stack memory or backed by a DAX-enabled device (unless the DAX device is hot-plugged as System RAM). The process must also not have .B PR_SET_THP_DISABLE -set (see +set without the +.B PR_THP_DISABLE_EXCEPT_ADVISED +flag (see .BR prctl (2)).Double negation is confusing. Please rephrase to something like The process can have X set only if Y is also set.
Yes, makes sense, will change to belwow in the next revision: The process can have .B PR_SET_THP_DISABLE set only if .B PR_THP_DISABLE_EXCEPT_ADVISED flag is set (see .BR prctl (2)).
quoted
.IP Thediff --git a/man/man2const/PR_GET_THP_DISABLE.2const b/man/man2const/PR_GET_THP_DISABLE.2const index 38ff3b370..df239700f 100644 --- a/man/man2const/PR_GET_THP_DISABLE.2const +++ b/man/man2const/PR_GET_THP_DISABLE.2const@@ -6,7 +6,7 @@ .SH NAME PR_GET_THP_DISABLE \- -get the state of the "THP disable" flag for the calling thread +get the state of the "THP disable" flags for the calling thread .SH LIBRARY Standard C library .RI ( libc ,\~ \-lc )@@ -18,13 +18,21 @@ Standard C library .B int prctl(PR_GET_THP_DISABLE, 0L, 0L, 0L, 0L); .fi .SH DESCRIPTION -Return the current setting of -the "THP disable" flag for the calling thread: -either 1, if the flag is set, or 0, if it is not. +Returns a value whose bits indicate how THP-disable is configureds/Returns/Return/
ack
quoted
+for the calling thread. +The returned value is interpreted as follows: +.P +.nf +.B "Bits" +.B " 1 0 Value Description" + 0 0 0 No THP-disable behaviour specified. + 0 1 1 THP is entirely disabled for this process. + 1 1 3 THP-except-advised mode is set for this process.We should probably use a table with .TS/.TE. See examples of this in other manual pages for how to use that (or read tbl(1) if you want). If you don't know how to use that, I can do it myself. tbl(1) is a bit weird.
I tried below, and it seemed to look ok in the output, but please let me know if its ok: .TS allbox; cb cb cb l c c c l. Bit 1 Bit 0 Value Description 0 0 0 No THP-disable behaviour specified. 0 1 1 THP is entirely disabled for this process. 1 1 3 THP-except-advised mode is set for this process. .TE
quoted
+.fi .SH RETURN VALUE On success, .BR PR_GET_THP_DISABLE , -returns the boolean value described above. +returns the value described above. On error, \-1 is returned, and .I errno is set to indicate the error.diff --git a/man/man2const/PR_SET_THP_DISABLE.2const b/man/man2const/PR_SET_THP_DISABLE.2const index 564e005d4..9f0f17702 100644 --- a/man/man2const/PR_SET_THP_DISABLE.2const +++ b/man/man2const/PR_SET_THP_DISABLE.2const@@ -6,7 +6,7 @@ .SH NAME PR_SET_THP_DISABLE \- -set the state of the "THP disable" flag for the calling thread +set the state of the "THP disable" flags for the calling thread .SH LIBRARY Standard C library .RI ( libc ,\~ \-lc )@@ -15,24 +15,62 @@ Standard C library .BR "#include <linux/prctl.h>" " /* Definition of " PR_* " constants */" .B #include <sys/prctl.h> .P -.BI "int prctl(PR_SET_THP_DISABLE, long " flag ", 0L, 0L, 0L);" +.BI "int prctl(PR_SET_THP_DISABLE, long " thp_disable ", unsigned long " flags ", 0L, 0L);"Hmmm, I'm reading this weirdly. Old code doing prctl(PR_SET_THP_DIABLE, 1, 0L, 0L, 0L); would be transformed from setting the flag before, to now using 0L as flags? Or how is backwards compatibility handled?
Its still backwards compatible. The name of the arguments is changed, but the arg values have not. Before you could do 2 things: prctl(PR_SET_THP_DISABLE, 0, 0, 0, 0); // to reset THP setting. prctl(PR_SET_THP_DISABLE, 1, 0, 0, 0); // to disable THPs completely. Now in addition to the 2 calls above, you can do: prctl(PR_SET_THP_DISABLE, 1, PR_THP_DISABLE_EXCEPT_ADVISED, 0, 0); // to disable THPs except madvise. Before arg2 was called flags and arg3 was always 0. Now arg2 is called thp_disable, and arg3 is called flags.
quoted
.fi .SH DESCRIPTION -Set the state of the "THP disable" flag for the calling thread. +Set the state of the "THP disable" flags for the calling thread. If -.I flag -has a nonzero value, the flag is set, otherwise it is cleared. +.I thp_disable +has a nonzero value, the THP disable flag is set according to the value ofPlease break the line after the comma.
ack
quoted
+.I flags, +otherwise it is cleared. .P -Setting this flag provides a method +This +.BR prctl (2) +provides a method for disabling transparent huge pages for jobs where the code cannot be modified, and using a malloc hook with .BR madvise (2) is not an option (i.e., statically allocated data). -The setting of the "THP disable" flag is inherited by a child created via +The setting of the "THP disable" flags is inherited by a child created via .BR fork (2) and is preserved across .BR execve (2). +.P +The behavior depends on the value of +.IR flags: +.TP +.B 0 +The +.BR prctl (2) +call will disable THPs completely for the process, +irrespective of global THP controls or +.BR MADV_COLLAPSE . +.TP +.B PR_THP_DISABLE_EXCEPT_ADVISED +The +.BR prctl (2) +call will disable THPs for the process except when the usage of THPs is +advised.Please break the line before 'except'. See 'Use semantic newlines' in man-pages(7).
ack>
quoted
+Consequently, THPs will only be used when: +.RS +.IP \[bu] 2s/2/3/
ack
See man-pages(7) ("Lists"): There should always be exactly 2 spaces between the list symbol and the elements. This doesn't apply to "tagged paragraphs", which use the default indentation rules. (If you grep(1) around, you'll see that number everywhere.)quoted
+Global THP controls are set to "always" or "madvise" and +.BR madvise (..., +.BR MADV_HUGEPAGE )I'd say .I madvise(..., MADV_HUGEPAGE) as an inlined expression, which goes in full italics; that's simpler.
This results in the entire line being underlined, which is probably not what not what we want?
quoted
+or +.BR madvise (..., +.BR MADV_COLLAPSE ) +is used. +.IP \[bu] +Global THP controls are set to "never" and +.BR madvise (..., +.BR MADV_COLLAPSE ) +is used. +This is the same behavior as if THPs would not be disabled on +a process level.Please break the line before "as if".
ack
quoted
+.RE .SH RETURN VALUE On success, 0 is returned.Have a lovely day! Alex