Thread (54 messages) 54 messages, 10 authors, 2024-09-06

Re: [PATCH 06/13] mm/memory: Add dax_insert_pfn

From: Alistair Popple <apopple@nvidia.com>
Date: 2024-09-06 06:25:57
Also in: linux-arm-kernel, linux-cxl, linux-doc, linux-ext4, linux-fsdevel, linux-mm, linux-xfs, lkml, nvdimm

Jan Kara [off-list ref] writes:
On Thu 27-06-24 10:54:21, Alistair Popple wrote:
quoted
Currently to map a DAX page the DAX driver calls vmf_insert_pfn. This
creates a special devmap PTE entry for the pfn but does not take a
reference on the underlying struct page for the mapping. This is
because DAX page refcounts are treated specially, as indicated by the
presence of a devmap entry.

To allow DAX page refcounts to be managed the same as normal page
refcounts introduce dax_insert_pfn. This will take a reference on the
underlying page much the same as vmf_insert_page, except it also
permits upgrading an existing mapping to be writable if
requested/possible.

Signed-off-by: Alistair Popple <apopple@nvidia.com>
Overall this looks good to me. Some comments below.
quoted
---
 include/linux/mm.h |  4 ++-
 mm/memory.c        | 79 ++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 76 insertions(+), 7 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 9a5652c..b84368b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1080,6 +1080,8 @@ int vma_is_stack_for_current(struct vm_area_struct *vma);
 struct mmu_gather;
 struct inode;
 
+extern void prep_compound_page(struct page *page, unsigned int order);
+
You don't seem to use this function in this patch?
Thanks, bad rebase splitting this up. It belongs later in the series.
quoted
diff --git a/mm/memory.c b/mm/memory.c
index ce48a05..4f26a1f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1989,14 +1989,42 @@ static int validate_page_before_insert(struct page *page)
 }
 
 static int insert_page_into_pte_locked(struct vm_area_struct *vma, pte_t *pte,
-			unsigned long addr, struct page *page, pgprot_t prot)
+			unsigned long addr, struct page *page, pgprot_t prot, bool mkwrite)
 {
 	struct folio *folio = page_folio(page);
+	pte_t entry = ptep_get(pte);
 
-	if (!pte_none(ptep_get(pte)))
+	if (!pte_none(entry)) {
+		if (mkwrite) {
+			/*
+			 * For read faults on private mappings the PFN passed
+			 * in may not match the PFN we have mapped if the
+			 * mapped PFN is a writeable COW page.  In the mkwrite
+			 * case we are creating a writable PTE for a shared
+			 * mapping and we expect the PFNs to match. If they
+			 * don't match, we are likely racing with block
+			 * allocation and mapping invalidation so just skip the
+			 * update.
+			 */
+			if (pte_pfn(entry) != page_to_pfn(page)) {
+				WARN_ON_ONCE(!is_zero_pfn(pte_pfn(entry)));
+				return -EFAULT;
+			}
+			entry = maybe_mkwrite(entry, vma);
+			entry = pte_mkyoung(entry);
+			if (ptep_set_access_flags(vma, addr, pte, entry, 1))
+				update_mmu_cache(vma, addr, pte);
+			return 0;
+		}
 		return -EBUSY;
If you do this like:

		if (!mkwrite)
			return -EBUSY;

You can reduce indentation of the big block and also making the flow more
obvious...
Good idea.
quoted
+	}
+
 	/* Ok, finally just insert the thing.. */
 	folio_get(folio);
+	if (mkwrite)
+		entry = maybe_mkwrite(mk_pte(page, prot), vma);
+	else
+		entry = mk_pte(page, prot);
I'd prefer:

	entry = mk_pte(page, prot);
	if (mkwrite)
		entry = maybe_mkwrite(entry, vma);

but I don't insist. Also insert_pfn() additionally has pte_mkyoung() and
pte_mkdirty(). Why was it left out here?
An oversight by me, thanks for pointing it out!
								Honza
  
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help