Re: [PATCH v2 6/9] mm: free user PTE page table pages
From: Jason Gunthorpe <jgg@nvidia.com>
Date: 2021-09-01 16:16:21
Also in:
linux-mm, lkml
On Wed, Sep 01, 2021 at 06:13:07PM +0200, David Hildenbrand wrote:
On 01.09.21 17:32, Jason Gunthorpe wrote:quoted
On Wed, Sep 01, 2021 at 03:57:09PM +0200, David Hildenbrand wrote:quoted
On 01.09.21 15:53, Jason Gunthorpe wrote:quoted
On Thu, Aug 19, 2021 at 11:18:55AM +0800, Qi Zheng wrote:quoted
diff --git a/mm/gup.c b/mm/gup.c index 2630ed1bb4f4..30757f3b176c 100644 +++ b/mm/gup.c@@ -500,6 +500,9 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, if (unlikely(pmd_bad(*pmd))) return no_page_table(vma, flags); + if (!pte_try_get(mm, pmd)) + return no_page_table(vma, flags); + ptep = pte_offset_map_lock(mm, pmd, address, &ptl);This is not good on a performance path, the pte_try_get() is locking/locking the same lock that pte_offset_map_lock() is getting.Yes, and we really need patch #8, anything else is just confusing reviewers.It is a bit better with patch 8, but it is still not optimal, we don't need to do the atomic work at all if the entire ptep is accessed while locked. So the above is stil not what I would expect here, even with RCU. eg I would expect that this kind of change would work first with the existing paired acessors, ie pte = pte_offset_map(pmd, address); pte_unmap(pte); Should handle the refcount under the covers, and same kind of idea for the _locked/_unlocked varient.See my other mail.
Do you have a reference?
quoted
Only places that don't already use that pairing should get modified. To do this we have to extend the API so that pte_offset_map() can fail, or very cleverly return some kind of global non-present pte page (I wonder if the zero page would work?)I explored both ideas (returning NULL, return a specially prepared page) and it didn't work in some cases where we unmap+remap etc.
I wouldn't think it works everywhere, bit it works in a lot of places, and it is a heck of a lot better than what is proposed here. I'd rather see the places that can use it be moved, and the few places that can't be opencoded. Jason