Re: [PATCH v3 26/35] mm: fall back to mmap_lock if vma->anon_vma is not yet set
From: Hyeonggon Yoo <hidden>
Date: 2023-02-17 18:50:14
Also in:
linux-mm, lkml
On Fri, Feb 17, 2023 at 08:13:01AM -0800, Suren Baghdasaryan wrote:
On Fri, Feb 17, 2023 at 2:21 AM Hyeonggon Yoo [off-list ref] wrote:quoted
On Fri, Feb 17, 2023 at 11:15 AM Suren Baghdasaryan [off-list ref] wrote:quoted
On Thu, Feb 16, 2023 at 11:43 AM Suren Baghdasaryan [off-list ref] wrote:quoted
On Thu, Feb 16, 2023 at 7:44 AM Matthew Wilcox [off-list ref] wrote:quoted
On Wed, Feb 15, 2023 at 09:17:41PM -0800, Suren Baghdasaryan wrote:quoted
When vma->anon_vma is not set, page fault handler will set it by either reusing anon_vma of an adjacent VMA if VMAs are compatible or by allocating a new one. find_mergeable_anon_vma() walks VMA tree to find a compatible adjacent VMA and that requires not only the faulting VMA to be stable but also the tree structure and other VMAs inside that tree. Therefore locking just the faulting VMA is not enough for this search. Fall back to taking mmap_lock when vma->anon_vma is not set. This situation happens only on the first page fault and should not affect overall performance.I think I asked this before, but don't remember getting an aswer. Why do we defer setting anon_vma to the first fault? Why don't we set it up at mmap time?Yeah, I remember that conversation Matthew and I could not find the definitive answer at the time. I'll look into that again or maybe someone can answer it here.After looking into it again I'm still under the impression that vma->anon_vma is populated lazily (during the first page fault rather than at mmap time) to avoid doing extra work for areas which are never faulted. Though I might be missing some important detail here.I think this is because the kernel cannot merge VMAs that have different anon_vmas? Enabling lazy population of anon_vma could potentially increase the chances of merging VMAs.Hmm. Do you have a clear explanation why merging chances increase this way? A couple of possibilities I can think of would be: 1. If after mmap'ing a VMA and before faulting the first page into it we often change something that affects anon_vma_compatible() decision, like vm_policy; 2. When mmap'ing VMAs we do not map them consecutively but the final arrangement is actually contiguous. Don't think either of those cases would be very representative of a usual case but maybe I'm wrong or there is another reason?
Ok. I agree it does not represent common cases. Hmm then I wonder how it went from the initial approach of "allocate anon_vma objects only via fork()" [1] to "populate anon_vma at page faults". [2] [3] Maybe Hugh, Andrea or Andrew have opinions? [1] anon_vma RFC2, lore.kernel.org https://lore.kernel.org/lkml/20040311065254.GT30940@dualathlon.random (local) [2] The status of object-based reverse mapping, LWN.net https://lwn.net/Articles/85908 [3] rmap 39 add anon_vma rmap https://gitlab.com/hyeyoo/linux-historical/-/commit/8aa3448cabdfca146aa3fd36e852d0209fb2276a
quoted
quoted
quoted
In the end rather than changing that logic I decided to skip vma->anon_vma==NULL cases because I measured them being less than 0.01% of all page faults, so ROI from changing that would be quite low. But I agree that the logic is weird and maybe we can improve that. I will have to review that again when I'm working on eliminating all these special cases we skip, like swap/userfaults/etc.-- To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.