Re: [PATCH v3 17/35] mm/mmap: write-lock VMA before shrinking or expanding it
From: Suren Baghdasaryan <surenb@google.com>
Date: 2023-02-24 16:19:41
Also in:
linux-mm, lkml
On Fri, Feb 24, 2023 at 8:14 AM Liam R. Howlett [off-list ref] wrote:
* Suren Baghdasaryan [off-list ref] [230223 21:06]:quoted
On Thu, Feb 23, 2023 at 5:46 PM Liam R. Howlett [off-list ref] wrote:quoted
* Suren Baghdasaryan [off-list ref] [230223 16:16]:quoted
On Thu, Feb 23, 2023 at 12:28 PM Liam R. Howlett [off-list ref] wrote:quoted
Wait, I figured a better place to do this. init_multi_vma_prep() should vma_start_write() on any VMA that is passed in.. that we we catch any modifications here & in vma_merge(), which I think is missed in this patch set?Hmm. That looks like a good idea but in that case, why not do the locking inside vma_prepare() itself? From the description of that function it sounds like it was designed to acquire locks before VMA modifications, so would be the ideal location for doing that. WDYT?That might be even better. I think it will result in even less code.Yes.quoted
There is also a vma_complete() which might work to call vma_end_write_all() as well?If there are other VMAs already locked before vma_prepare() then we would unlock them too. Safer to just let mmap_unlock do vma_end_write_all().quoted
quoted
The only concern is vma_adjust_trans_huge() being called before vma_prepare() but I *think* that's safe because vma_adjust_trans_huge() does its modifications after acquiring PTL lock, which page fault handlers also have to take. Does that sound right?I am not sure. We are certainly safe the way it is, and the PTL has to be safe for concurrent faults.. but this could alter the walk to a page table while that walk is occurring and I don't think that happens today. It might be best to leave the locking order the way you have it, unless someone can tell us it's safe?Yes, I have the same feelings about changing this.quoted
We could pass through the three extra variables that are needed to move the vma_adjust_trans_huge() call within that function as well? This would have the added benefit of having all locking grouped in the one location, but the argument list would be getting long, however we could use the struct.Any issues if I change the order to have vma_prepare() called always before vma_adjust_trans_huge()? That way the VMA will always be locked before vma_adjust_trans_huge() executes and we don't need any additional arguments.I preserved the locking order from __vma_adjust() to ensure there was no issues. I am not sure but, looking through the page table information [1], it seems that vma_adjust_trans_huge() uses the pmd lock, which is part of the split page table lock. According to the comment in rmap, it should be fine to reverse the ordering here. Instead of: mmap_lock() vma_adjust_trans_huge() pte_lock pte_unlock vma_prepare() mapping->i_mmap_rwsem lock anon_vma->rwsem lock <changes to tree/VMAs> vma_complete() anon_vma->rwsem unlock mapping->i_mmap_rwsem unlock mmap_unlock() --------- We would have: mmap_lock() vma_prepare() mapping->i_mmap_rwsem lock anon_vma->rwsem lock vma_adjust_trans_huge() pte_lock pte_unlock <changes to tree/VMAs> vma_complete() anon_vma->rwsem unlock mapping->i_mmap_rwsem unlock mmap_unlock() Essentially, increasing the nesting of the pte lock, but not violating the ordering. 1. https://docs.kernel.org/mm/split_page_table_lock.html
Thanks for the confirmation, Liam. I'll make the changes and test over the weekend. If everything's still fine, I will post the next version with these and other requested changes on Monday.
quoted
quoted
remove & remove2 should be be detached in vma_prepare() or vma_complete() as well?They are marked detached in vma_complete() (see https://lore.kernel.org/all/20230216051750.3125598-25-surenb@google.com/ (local)) and that should be enough. We should be safe as long as we mark them detached before unlocking the VMA.Right, Thanks. ... -- To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.