Re: [PATCH v4 12/25] mm/memory: Enhance insert_page_into_pte_locked() to create writable mappings
From: David Hildenbrand <hidden>
Date: 2025-01-07 11:29:48
Also in:
linux-arm-kernel, linux-cxl, linux-doc, linux-ext4, linux-fsdevel, linux-mm, linux-xfs, lkml, nvdimm
On 06.01.25 03:07, Alistair Popple wrote:
On Fri, Dec 20, 2024 at 08:06:48PM +0100, David Hildenbrand wrote:quoted
On 20.12.24 20:01, David Hildenbrand wrote:quoted
On 17.12.24 06:12, Alistair Popple wrote:quoted
In preparation for using insert_page() for DAX, enhance insert_page_into_pte_locked() to handle establishing writable mappings. Recall that DAX returns VM_FAULT_NOPAGE after installing a PTE which bypasses the typical set_pte_range() in finish_fault. Signed-off-by: Alistair Popple <apopple@nvidia.com> Suggested-by: Dan Williams <redacted> --- Changes since v2: - New patch split out from "mm/memory: Add dax_insert_pfn" --- mm/memory.c | 45 +++++++++++++++++++++++++++++++++++++-------- 1 file changed, 37 insertions(+), 8 deletions(-)diff --git a/mm/memory.c b/mm/memory.c index 06bb29e..cd82952 100644 --- a/mm/memory.c +++ b/mm/memory.c@@ -2126,19 +2126,47 @@ static int validate_page_before_insert(struct vm_area_struct *vma, } static int insert_page_into_pte_locked(struct vm_area_struct *vma, pte_t *pte, - unsigned long addr, struct page *page, pgprot_t prot) + unsigned long addr, struct page *page, + pgprot_t prot, bool mkwrite) { struct folio *folio = page_folio(page); + pte_t entry = ptep_get(pte); pte_t pteval; - if (!pte_none(ptep_get(pte))) - return -EBUSY; + if (!pte_none(entry)) { + if (!mkwrite) + return -EBUSY; + + /* + * For read faults on private mappings the PFN passed in may not + * match the PFN we have mapped if the mapped PFN is a writeable + * COW page. In the mkwrite case we are creating a writable PTE + * for a shared mapping and we expect the PFNs to match. If they + * don't match, we are likely racing with block allocation and + * mapping invalidation so just skip the update. + */Would it make sense to instead have here /* See insert_pfn(). */ But ...quoted
+ if (pte_pfn(entry) != page_to_pfn(page)) { + WARN_ON_ONCE(!is_zero_pfn(pte_pfn(entry))); + return -EFAULT; + } + entry = maybe_mkwrite(entry, vma); + entry = pte_mkyoung(entry); + if (ptep_set_access_flags(vma, addr, pte, entry, 1)) + update_mmu_cache(vma, addr, pte);... I am not sure if we want the above at all. Someone inserted a page, which is refcounted + mapcounted already. Now you ignore that and do like the second insertion "worked" ? No, that feels wrong, I suspect you will run into refcount+mapcount issues. If there is already something, inserting must fail IMHO. If you want to change something to upgrade write permissions, then a different interface should be used.Ah, now I realize that the early exit saves you because we won't adjust the refcount +mapcount.Right.quoted
I still wonder if that really belongs in here, I would prefer to not play such tricks to upgrade write permissions if possible.As you have pointed out this was all inspired (ie. mostly copied) from the existing insert_pfn() implementation which is used from vmf_insert_mixed{_mkwrite}(). I agree a different interface to upgrade permissions would be nice. However it's tricky because in general callers of these functions (eg. FS DAX) aren't aware if the page is already mapped by a PTE/PMD. They only know a fault has occured and the faulting permissions. This wouldn't be impossible to fix - the mm does provide vm_ops->page_mkwrite() for permission upgrades. The difficulty is that most filesystems that support FS DAX (ie. ext4, XFS) don't treat a vm_ops->page_mkwrite() call any differently from a vm_ops->fault() call due to write fault. Therefore the FS DAX code is unaware of whether or not this is a permission upgrade or initial writeable mapping of the page in the VMA. A further issue in there is currently no vm_ops->huge_mkwrite() callback. Obviously this could all be plumbed through the MM/FS layers, but that would require a separate patch series. Given the current implementation has no issues beyond the cosmetic I'd rather not delay this series any longer, especially as the cosmetic defect is largely pre-existing (vmf_insert_mixed{_mkwrite}() could have equally had a separate upgrade interface).
Fine with me, just stumbled over it an thought "that looks odd". -- Cheers, David / dhildenb