Re: [RFC PATCH RESEND 00/28] per-VMA locks proposal
From: Kent Overstreet <kent.overstreet@linux.dev>
Date: 2022-09-01 20:59:28
Also in:
linux-arm-kernel, linux-mm, lkml
On Thu, Sep 01, 2022 at 10:34:48AM -0700, Suren Baghdasaryan wrote:
Resending to fix the issue with the In-Reply-To tag in the original submission at [4]. This is a proof of concept for per-vma locks idea that was discussed during SPF [1] discussion at LSF/MM this year [2], which concluded with suggestion that “a reader/writer semaphore could be put into the VMA itself; that would have the effect of using the VMA as a sort of range lock. There would still be contention at the VMA level, but it would be an improvement.” This patchset implements this suggested approach. When handling page faults we lookup the VMA that contains the faulting page under RCU protection and try to acquire its lock. If that fails we fall back to using mmap_lock, similar to how SPF handled this situation. One notable way the implementation deviates from the proposal is the way VMAs are marked as locked. Because during some of mm updates multiple VMAs need to be locked until the end of the update (e.g. vma_merge, split_vma, etc). Tracking all the locked VMAs, avoiding recursive locks and other complications would make the code more complex. Therefore we provide a way to "mark" VMAs as locked and then unmark all locked VMAs all at once. This is done using two sequence numbers - one in the vm_area_struct and one in the mm_struct. VMA is considered locked when these sequence numbers are equal. To mark a VMA as locked we set the sequence number in vm_area_struct to be equal to the sequence number in mm_struct. To unlock all VMAs we increment mm_struct's seq number. This allows for an efficient way to track locked VMAs and to drop the locks on all VMAs at the end of the update.
I like it - the sequence numbers are a stroke of genuius. For what it's doing the patchset seems almost small. Two complaints so far: - I don't like the vma_mark_locked() name. To me it says that the caller already took or is taking the lock and this function is just marking that we're holding the lock, but it's really taking a different type of lock. But this function can block, it really is taking a lock, so it should say that. This is AFAIK a new concept, not sure I'm going to have anything good either, but perhaps vma_lock_multiple()? - I don't like the #ifdef and the separate fallback path in the fault handlers. Can we make find_and_lock_anon_vma() do the right thing, and not fail unless e.g. there isn't a vma at that address? Just have it wait for vm_lock_seq to change and then retry if needed.