Thread (72 messages) 72 messages, 8 authors, 8d ago

Re: [PATCH v2 0/5] mm: reduce mmap_lock contention and improve page fault performance

From: Yang Shi <hidden>
Date: 2026-05-20 21:40:02
Also in: linux-arm-kernel, linux-mm, linux-riscv, linux-s390, lkml, loongarch

On Wed, May 20, 2026 at 3:34 AM David Hildenbrand (Arm)
[off-list ref] wrote:
On 5/19/26 14:53, Lorenzo Stoakes wrote:
quoted
On Mon, May 18, 2026 at 12:56:59PM -0700, Suren Baghdasaryan wrote:
quoted
quoted
I think we either need to fix `fork()`, or keep the current
behavior of dropping the VMA lock before performing I/O.
I see. So, this problem arises from the fact that we are changing the
pagefaults requiring I/O operation to hold VMA lock...
And you want to lock VMA on fork only if vma_is_anonymous(vma) ||
is_cow_mapping(vma->vm_flags). So, we will be blocking page faults for
anonymous and COW VMAs only while holding mmap_write_lock, preventing
any VMA modification. On the surface, that looks ok to me but I might
be missing some corner cases. If nobody sees any obvious issues, I
think it's worth a try.
Not sure if you noticed but I did raise concerns ;)

I wonder if you've confused the fault path and fork here, as I think Barry has
been a little unclear on that.

What's being suggested in this thread is to fundamentally change fork behaviour
so it's different from the entire history of the kernel (or - presumably - at
least recent history :)
I don't want fork() to become different in that regard.

There is already a slight difference with vs. without per-VMA locks, because
there is a window in-between us taking the write mmap_lock and all the per-VMA
locks. I raised that previously [1] and assumed that it is probably fine.

I also raised in the past why I think we must not allow concurrent page faults,
at least as soon as anonymous memory is involved [2].
Thanks for sharing the context, it is quite helpful to understand the
race conditions. Because Lorenzo also raised the concern about page
fault race, I will reply to all the concerns regarding page fault race
together in this thread.

IIUC, there is already some sort of race with per vma lock. Before per
vma lock, mmap_lock did lock everything. So page fault happened either
before fork or after fork. But page fault can happen on other VMAs
which have not been lock'ed yet during fork with per vma lock. For
example, we have 3 VMAs, we lock the first VMA, but page fault still
can happen on the other 2 VMAs during fork if they already have
anon_vma. This is the status quo now, but it seems not harmful.

The bad race shared by David is caused by racing with copy page. So it
seems like it will be fine as long as we serialize copy page against
page fault if I don't miss anything. Since we decide whether to copy
page or not by checking vma->anon_vma, so it seems fine to not take
vma lock if vma->anon_vma is NULL. This will not introduce more race
either because setting up a new  anon_vma in page fault or madvise
requires taking mmap_lock according to the earlier discussions.

Thanks,
Yang
... and I raised that this is pretty much slower by design right now: "Well, the
design decision that CONFIG_PER_VMA_LOCK made for now to make page faults fast
and to make blocking any page faults from happening to  be slower ..." [3]

[1] https://lore.kernel.org/all/970295ab-e85d-7af3-76e6-df53a5c52f8b@redhat.com/ (local)
[2] https://lore.kernel.org/all/7e3f35cc-59b9-bf12-b8b1-4ed78223844a@redhat.com/ (local)
[3] https://lore.kernel.org/all/2efa2c89-3765-721d-2c3c-00590054aa5b@redhat.com/ (local)

--
Cheers,

David
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help