Thread (38 messages) 38 messages, 8 authors, 2021-02-25

Re: [PATCH RFC] mm/madvise: introduce MADV_POPULATE to prefault/prealloc memory

From: Michal Hocko <mhocko@suse.com>
Date: 2021-02-18 13:27:54
Also in: linux-alpha, linux-mips, linux-mm, lkml

On Thu 18-02-21 11:44:41, David Hildenbrand wrote:
On 18.02.21 11:25, Michal Hocko wrote:
quoted
On Wed 17-02-21 16:48:44, David Hildenbrand wrote:
quoted
When we manage sparse memory mappings dynamically in user space - also
sometimes involving MADV_NORESERVE - we want to dynamically populate/
Just wondering what is MADV_NORESERVE? I do not see anything like that
in the Linus tree. Did you mean MAP_NORESERVE?
Most certainly, thanks :)
OK, good, I thought I have missed something.
[...]
quoted
quoted
2. Errors during MADV_POPULATED (especially OOM) are reported.
How do you want to achieve that? gup/page fault handler will allocate
memory and trigger the oom without caller noticing that. You would
somehow have to weaken the allocation context to GFP_RETRY_MAYFAIL or
NORETRY to achieve the error handling.
Okay, I should be more clear here (again, I'm realizing this as well while I
create the man page), OOM is confusing: avoid SIGBUS at runtime - like we
would get on actual file systems/shmem/hugetlbfs when preallocating.
Yes, preventing SIGBUS for unreserved mappings is a reasonable
expectation. Regarding OOM chances are off I am afraid. We used to have
a weaker model for MAP_POPULATE for memcg oom in the past and it turned
out more problematic than useful.
 
It cannot save us from the actual OOM killer. To handle anonymous memory
more reliable I'll need other means as well (dynamic swap space allocation
for sparse mappings).
quoted
quoted
    If we hit
    hardware errors on pages, ignore them - nothing we really can or
    should do.
3. On errors during MADV_POPULATED, some memory might have been
    populated. Callers have to clean up if they care.
How does caller find out? madvise reports 0 on success so how do you
find out how much has been populated?
If there is an error, something might have been populated. In my QEMU
implementation, I simply discard the range again, good enough. I don't think
we need to really indicate "error and populated" or "error and not
populated".
Agreed. The wording just suggests that the syscall actually provides any
means for an effective way to handle those errors. Maybe you should just
stick with the first sentence and drop the second.
 
quoted
quoted
4. Concurrent changes to the virtual memory layour are tolerated - we
    process each and every PFN only once, though.
I do not understand this. madvise is about virtual address space not a
physical address space.
What I wanted to express: if we detect a change in the mapping we don't
restart at the beginning, we always make forward progress. We process each
virtual address once (on a per-page basis, thus I accidentally used "PFN").
This is an implicit assumption. Your range can have the same page mapped
several times in the given address range and all you care about is that
you fault those which are not present during the virtual address space
walk. Your syscall can return and large part of the address space might
be unpopulated because memory reclaim just dropped those pages and that
would be fine. This shouldn't really imply memory presence - mlock does
that.
quoted
quoted
5. If MADV_POPULATE succeeds, all memory in the range can be accessed
    without SIGBUS. (of course, not if user space changed mappings in the
    meantime or KSM kicked in on anonymous memory).
I do not see how KSM would change anything here and maybe it is not
really important to mention it. KSM should be really transparent from
the users space POV. Parallel and destructive virtual address space
operations are also expected to change the outcome and there is nothing
kernel do about at and provide any meaningful guarantees. I guess we
want to assume a reasonable userspace behavior here.
It's just a note that we cannot protect from someone interfering
(discard/ksm/whatever). I'm making that clearer in the cover letter.
Again that is implicit expectation. madvise will not work for anybody
shooting an own foot.

-- 
Michal Hocko
SUSE Labs
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help