Re: [PATCH v5 7/9] mm/mremap: Move TLB flush outside page table lock
From: Aneesh Kumar K.V <hidden>
Date: 2021-05-21 03:29:08
Also in:
linuxppc-dev
"Aneesh Kumar K.V" [off-list ref] writes:
On 5/21/21 8:10 AM, Linus Torvalds wrote:quoted
On Thu, May 20, 2021 at 6:57 AM Aneesh Kumar K.V [off-list ref] wrote:quoted
Wondering whether this is correct considering we are holding mmap_sem in write mode in mremap.Right. So *normally* the rule is to EITHER - hold the mmap_sem for writing OR - hold the page table lock and that the TLB flush needs to happen before you release that lock. But as that commit message of commit eb66ae030829 ("mremap: properly flush TLB before releasing the page") says, "mremap()" is a bit special. It's special because mremap() didn't take ownership of the page - it only moved it somewhere else. So now the page-out logic - that relies on the page table lock - can free the page immediately after we've released the page table lock. So basically, in order to delay the TLB flush after releasing the page table lock, it's not really sufficient to _just_ hold the mmap_sem for writing. You also need to guarantee that the lifetime of the page itself is held until after the TLB flush. For normal operations like "munmap()", this happens naturally, because we remove the page from the page table, and add it to the list of pages to be freed after the TLB flush. But mremap never did that "remove the page and add it to a list to be free'd later". Instead, it just moved the page somewhere else. And thus there is no guarantee that the page that got moved will continue to exist until a TLB flush is done. So mremap does need to flush the TLB before releasing the page table lock, because that's the lifetime boundary for the page that got moved.How will we avoid that happening with c49dd340180260c6239e453263a9a244da9a7c85 / 2c91bd4a4e2e530582d6fd643ea7b86b27907151 . The commit improves mremap performance by moving level3/level2 page table entries. When doing so we are not holding level 4 ptl lock (pte_lock()). But rather we are holding pmd_lock or pud_lock(). So if we move pages around without holding the pte lock, won't the above issue happen even if we do a tlb flush with holding pmd lock/pud lock?
This should help? ie, we flush tlb before we move pagetables to the new address? modified mm/mremap.c
@@ -277,11 +277,14 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, /* Clear the pmd */ pmd = *old_pmd; pmd_clear(old_pmd); - + /* + * flush the TLB before we move the page table entries. + * TLB flush includes necessary barriers. + */ + flush_pte_tlb_pwc_range(vma, old_addr, old_addr + PMD_SIZE); VM_BUG_ON(!pmd_none(*new_pmd)); pmd_populate(mm, new_pmd, pmd_pgtable(pmd)); - flush_pte_tlb_pwc_range(vma, old_addr, old_addr + PMD_SIZE); if (new_ptl != old_ptl) spin_unlock(new_ptl); spin_unlock(old_ptl); -aneesh