Re: [PATCH RFC] mm/madvise: introduce MADV_POPULATE to prefault/prealloc memory
From: David Hildenbrand <hidden>
Date: 2021-02-18 13:15:10
Also in:
linux-alpha, linux-arch, linux-mm, lkml
quoted
quoted
quoted
If we hit hardware errors on pages, ignore them - nothing we really can or should do. 3. On errors during MADV_POPULATED, some memory might have been populated. Callers have to clean up if they care.How does caller find out? madvise reports 0 on success so how do you find out how much has been populated?If there is an error, something might have been populated. In my QEMU implementation, I simply discard the range again, good enough. I don't think we need to really indicate "error and populated" or "error and not populated".Agreed. The wording just suggests that the syscall actually provides any means for an effective way to handle those errors. Maybe you should just stick with the first sentence and drop the second.
Makes sense. "On errors during MADV_POPULATE, some memory might have been populated."
quoted
quoted
quoted
4. Concurrent changes to the virtual memory layour are tolerated - we process each and every PFN only once, though.I do not understand this. madvise is about virtual address space not a physical address space.What I wanted to express: if we detect a change in the mapping we don't restart at the beginning, we always make forward progress. We process each virtual address once (on a per-page basis, thus I accidentally used "PFN").This is an implicit assumption. Your range can have the same page mapped several times in the given address range and all you care about is that you fault those which are not present during the virtual address space walk. Your syscall can return and large part of the address space might be unpopulated because memory reclaim just dropped those pages and that would be fine. This shouldn't really imply memory presence - mlock does that.
"Concurrent changes to the virtual memory layout are tolerated. The range is processed exactly once."
quoted
quoted
quoted
5. If MADV_POPULATE succeeds, all memory in the range can be accessed without SIGBUS. (of course, not if user space changed mappings in the meantime or KSM kicked in on anonymous memory).I do not see how KSM would change anything here and maybe it is not really important to mention it. KSM should be really transparent from the users space POV. Parallel and destructive virtual address space operations are also expected to change the outcome and there is nothing kernel do about at and provide any meaningful guarantees. I guess we want to assume a reasonable userspace behavior here.It's just a note that we cannot protect from someone interfering (discard/ksm/whatever). I'm making that clearer in the cover letter.Again that is implicit expectation. madvise will not work for anybody shooting an own foot.
Okay, I'll drop that part, thanks! -- Thanks, David / dhildenb