Re: [PATCH v3 26/35] mm: fall back to mmap_lock if vma->anon_vma is not yet set
From: Matthew Wilcox <willy@infradead.org>
Date: 2023-02-17 16:06:51
Also in:
linux-arm-kernel, linux-mm, lkml
On Thu, Feb 16, 2023 at 06:14:59PM -0800, Suren Baghdasaryan wrote:
On Thu, Feb 16, 2023 at 11:43 AM Suren Baghdasaryan [off-list ref] wrote:quoted
On Thu, Feb 16, 2023 at 7:44 AM Matthew Wilcox [off-list ref] wrote:quoted
On Wed, Feb 15, 2023 at 09:17:41PM -0800, Suren Baghdasaryan wrote:quoted
When vma->anon_vma is not set, page fault handler will set it by either reusing anon_vma of an adjacent VMA if VMAs are compatible or by allocating a new one. find_mergeable_anon_vma() walks VMA tree to find a compatible adjacent VMA and that requires not only the faulting VMA to be stable but also the tree structure and other VMAs inside that tree. Therefore locking just the faulting VMA is not enough for this search. Fall back to taking mmap_lock when vma->anon_vma is not set. This situation happens only on the first page fault and should not affect overall performance.I think I asked this before, but don't remember getting an aswer. Why do we defer setting anon_vma to the first fault? Why don't we set it up at mmap time?Yeah, I remember that conversation Matthew and I could not find the definitive answer at the time. I'll look into that again or maybe someone can answer it here.After looking into it again I'm still under the impression that vma->anon_vma is populated lazily (during the first page fault rather than at mmap time) to avoid doing extra work for areas which are never faulted. Though I might be missing some important detail here.
How often does userspace call mmap() and then _never_ fault on it? I appreciate that userspace might mmap() gigabytes of address space and then only end up using a small amount of it, so populating it lazily makes sense. But creating a region and never faulting on it? The only use-case I can think of is loading shared libraries: openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 (...) mmap(NULL, 1970000, PROT_READ, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f0ce612e000 mmap(0x7f0ce6154000, 1396736, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x26000) = 0x7f0ce6154000 mmap(0x7f0ce62a9000, 339968, PROT_READ, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17b000) = 0x7f0ce62a9000 mmap(0x7f0ce62fc000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1ce000) = 0x7f0ce62fc000 mmap(0x7f0ce6302000, 53072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f0ce6302000 but that's a file-backed VMA, not an anon VMA.