Re: [PATCH RFC] mm/madvise: introduce MADV_POPULATE to prefault/prealloc memory
From: David Hildenbrand <hidden>
Date: 2021-02-19 11:12:45
Also in:
linux-arch, linux-mips, linux-mm, lkml
On 19.02.21 12:04, Michal Hocko wrote:
On Fri 19-02-21 11:43:48, David Hildenbrand wrote:quoted
On 19.02.21 11:35, Michal Hocko wrote:quoted
On Wed 17-02-21 16:48:44, David Hildenbrand wrote: [...] I only got to the implementation now.quoted
+static long madvise_populate(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end) +{ + struct mm_struct *mm = vma->vm_mm; + unsigned long tmp_end; + int locked = 1; + long pages; + + *prev = vma; + + while (start < end) { + /* + * We might have temporarily dropped the lock. For example, + * our VMA might have been split. + */ + if (!vma || start >= vma->vm_end) { + vma = find_vma(mm, start); + if (!vma) + return -ENOMEM; + }Why do you need to find a vma when you already have one. do_madvise will give you your vma already. I do understand that you want to finish the vma for some errors but that shouldn't require handling vmas. You should be in the shope of one here unless I miss anything.See below, we might temporary drop the lock while not having processed all pagesquoted
quoted
+ + /* Bail out on incompatible VMA types. */ + if (vma->vm_flags & (VM_IO | VM_PFNMAP) || + !vma_is_accessible(vma)) { + return -EINVAL; + } + + /* + * Populate pages and take care of VM_LOCKED: simulate user + * space access. + * + * For private, writable mappings, trigger a write fault to + * break COW (i.e., shared zeropage). For other mappings (i.e., + * read-only, shared), trigger a read fault. + */ + tmp_end = min_t(unsigned long, end, vma->vm_end); + pages = populate_vma_page_range(vma, start, tmp_end, &locked); + if (!locked) { + mmap_read_lock(mm); + *prev = NULL; + vma = NULL;^ here so, the VMA might have been replaced/split/... in the meantime. So to make forward progress, I have to lookup again. (similar. but different to madvise_dontneed_free()).Right. Missed that.
It would look more natural if we'd just be processing the whole range - but then it would not fit into the generic infrastructure and would result in even more code. I decided to go with "process the passed range and treat the given VMA as an initial VMA that is invalidated as soon as we drop the lock". -- Thanks, David / dhildenb