Thread (5 messages) 5 messages, 2 authors, 2021-08-19

Re: [PATCH v2] madvise.2: Document MADV_POPULATE_READ and MADV_POPULATE_WRITE

From: Michael Kerrisk (man-pages) <hidden>
Date: 2021-08-18 20:58:27
Also in: linux-api, linux-man

Hello David,

On 8/18/21 10:35 AM, David Hildenbrand wrote:
On 17.08.21 23:42, Michael Kerrisk (man-pages) wrote:
quoted
Hello David,

Thank you for writing this! Could you please take
a look at the comments below and revise?
Hi Michael,

thanks for your valuable input. Your feedback will certainly make this 
easier to understand for people that are not heavily involved in MM work :)

[...]
quoted
quoted
  man2/madvise.2 | 107 +++++++++++++++++++++++++++++++++++++++++++++++++
  1 file changed, 107 insertions(+)
diff --git a/man2/madvise.2 b/man2/madvise.2
index f1f384c0c..f6cea9ad2 100644
--- a/man2/madvise.2
+++ b/man2/madvise.2
@@ -469,6 +469,72 @@ If a page is file-backed and dirty, it will be written back to the backing
  storage.
  The advice might be ignored for some pages in the range when it is not
  applicable.
+.TP
+.BR MADV_POPULATE_READ " (since Linux 5.14)"
+Populate (prefault) page tables readable for the whole range without actually
I have trouble to understand "Populate (prefault) page tables readable".
Does it mean that it is just the page tables are being populated, and the
PTEs are marked to indicate that the pages are readable? If yes, I
think some rewording would help.
I actually tried phrasing it similar to our MAP_POPULATE documentation:
	("Populate  (prefault)  page tables for a mapping.")
Yeah, well that description is a bit thin too :-}.
We will prefault all pages, faulting them in.
quoted
quoted
+reading memory.
I don't understand "without actually reading memory"? Do you mean,
"without actually  faulting in the pages"; or something else?
"Populate (prefault) page tables readable, faulting in all pages in the 
range just as if manually reading one byte of each page; however, avoid 
the actual memory access that would have been performed after handling 
the fault."

Does that make it clearer? (avoiding eventually touching the page at all 
can be beneficial, especially when dealing with DAX memory where memory 
access might be expensive)
That text is much better. But, what's still not clear to me then is the
dfference between mmap(2) MAP_POPULATE, and MADV_POPULATE_READ and
MADV_POPULATE_WRITE. What is the differnece, and in what situations 
would one prefer one or the other approach? I think it would be helpful
if the manual page said something about these details.
quoted
quoted
+Depending on the underlying mapping,
+map the shared zeropage,
+preallocate memory or read the underlying file;
+files with holes might or might not preallocate blocks.
+Do not generate
+.B SIGBUS
+when populating fails,
+return an error instead.
Better:

[[
If populating fails, a
.B SIGBUS
signal is not generated; instead, an error i returned.
]]
Sure, thanks.
quoted
quoted
+.IP
+If
+.B MADV_POPULATE_READ
+succeeds,
+all page tables have been populated (prefaulted) readable once.
+If
+.B MADV_POPULATE_READ
+fails,
+some page tables might have been populated.
+.IP
+.B MADV_POPULATE_READ
+cannot be applied to mappings without read permissions
+and special mappings,
+for example,
+marked with the kernel-internal
s/marked/mappings marked/
quoted
+.B VM_PFNMAP
+and
Just checking: should it be "and" or "or" here"?

Looking at the EINVAL error below, I guess "or", and a better
wording would be:

[[
...for example, mappings marked with kernel-internal flags such as
.B VMPPFNMAP
or
.BR BR_V_IO.
]]
Much better. Note that there might be more types of mappings that won't 
work (e.g., initially also secretmem IIRC).
Ahh nice. Since there's about to be a memfd_secret() manual page, 
I suggest adding also "or secret memory regions created using
memfd_secret(2)".
quoted
quoted
+.BR VM_IO .
+.IP
+Note that with
+.BR MADV_POPULATE_READ ,
+the process can be killed at any moment when the system runs out of memory.
+.TP
+.BR MADV_POPULATE_WRITE " (since Linux 5.14)"
+Populate (prefault) page tables writable for the whole range without actually
I have trouble to understand "Populate (prefault) page tables writable".
Does it mean that it is just the page tables are being populated, and the
PTEs are marked to indicate that the pages are writable? If yes, I
think some rewording would help.
quoted
+writing memory.
I don't understand "without actually writing memory"? Do you mean,
"without actually  faulting in the pages"; or something else?
Similar to the other wording:

"Populate (prefault) page tables writable, faulting in all pages in the 
range just as if manually writing one byte of each page; however, avoid 
the actual memory access that would have been performed after handling 
the fault."
Much better, but see also my comments above re MADV_POPULATE_READ.

[...]
quoted
quoted
+.B EFAULT
+.I advice
+is
+.B MADV_POPULATE_READ
+or
+.BR MADV_POPULATE_WRITE ,
+and populating (prefaulting) page tables failed because a
+.B SIGBUS
+would have been generated on actual memory access and the reason is not a
+HW poisoned page.
Maybe:
s/.$/(see the description of MADV_HWPOISON in this page)./
?
Sure, we can add that. But note that MADV_HWPOISON is just one of many 
ways to HWpoison a page.
Then maybe something like:

"(HW poisoned pages can, for example, be created using the
MADV_HWPOISON flag described elsewhere in this page.)"

[...]

Thanks,

Michael

-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help