Re: [PATCH v5 05/26] mm/swap: Introduce the idea of special swap ptes

[PATCH v5 00/26] userfaultfd-wp: Support shmem and hugetlbfs · Peter Xu <peterx@redhat.com> · 2021-07-15
[PATCH v5 02/26] shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP · Peter Xu <peterx@redhat.com> · 2021-07-15
[PATCH v5 05/26] mm/swap: Introduce the idea of special swap ptes · Peter Xu <peterx@redhat.com> · 2021-07-15
Re: [PATCH v5 05/26] mm/swap: Introduce the idea of special swap ptes · Alistair Popple <apopple@nvidia.com> · 2021-07-16
Re: [PATCH v5 05/26] mm/swap: Introduce the idea of special swap ptes · Peter Xu <peterx@redhat.com> · 2021-07-16
Re: [PATCH v5 05/26] mm/swap: Introduce the idea of special swap ptes · Alistair Popple <apopple@nvidia.com> · 2021-07-21
Re: [PATCH v5 05/26] mm/swap: Introduce the idea of special swap ptes · Peter Xu <peterx@redhat.com> · 2021-07-21
Re: [PATCH v5 05/26] mm/swap: Introduce the idea of special swap ptes · Alistair Popple <apopple@nvidia.com> · 2021-07-22
Re: [PATCH v5 05/26] mm/swap: Introduce the idea of special swap ptes · Peter Xu <peterx@redhat.com> · 2021-07-22
[PATCH v5 03/26] mm: Clear vmf->pte after pte_unmap_same() returns · Peter Xu <peterx@redhat.com> · 2021-07-15
[PATCH v5 06/26] shmem/userfaultfd: Handle uffd-wp special pte in page fault handler · Peter Xu <peterx@redhat.com> · 2021-07-15
[PATCH v5 04/26] mm/userfaultfd: Introduce special pte for unmapped file-backed mem · Peter Xu <peterx@redhat.com> · 2021-07-15
[PATCH v5 08/26] mm: Introduce zap_details.zap_flags · Peter Xu <peterx@redhat.com> · 2021-07-15
[PATCH v5 07/26] mm: Drop first_index/last_index in zap_details · Peter Xu <peterx@redhat.com> · 2021-07-15
[PATCH v5 09/26] mm: Introduce ZAP_FLAG_SKIP_SWAP · Peter Xu <peterx@redhat.com> · 2021-07-15
[PATCH v5 10/26] shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed · Peter Xu <peterx@redhat.com> · 2021-07-15
[PATCH v5 01/26] mm/shmem: Unconditionally set pte dirty in mfill_atomic_install_pte · Peter Xu <peterx@redhat.com> · 2021-07-15
[PATCH v5 12/26] shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on thps · Peter Xu <peterx@redhat.com> · 2021-07-15
[PATCH v5 16/26] mm/hugetlb: Introduce huge pte version of uffd-wp helpers · Peter Xu <peterx@redhat.com> · 2021-07-15
[PATCH v5 15/26] mm/hugetlb: Drop __unmap_hugepage_range definition from hugetlb.h · Peter Xu <peterx@redhat.com> · 2021-07-15
[PATCH v5 14/26] shmem/userfaultfd: Pass over uffd-wp special swap pte when fork() · Peter Xu <peterx@redhat.com> · 2021-07-15
[PATCH v5 17/26] hugetlb/userfaultfd: Hook page faults for uffd write protection · Peter Xu <peterx@redhat.com> · 2021-07-15
Re: [PATCH v5 17/26] hugetlb/userfaultfd: Hook page faults for uffd write protection · kernel test robot <hidden> · 2021-07-20
Re: [PATCH v5 17/26] hugetlb/userfaultfd: Hook page faults for uffd write protection · Peter Xu <peterx@redhat.com> · 2021-07-21
[PATCH v5 18/26] hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP · Peter Xu <peterx@redhat.com> · 2021-07-15
Re: [PATCH v5 18/26] hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP · kernel test robot <hidden> · 2021-07-21
[PATCH v5 20/26] mm/hugetlb: Introduce huge version of special swap pte helpers · Peter Xu <peterx@redhat.com> · 2021-07-15
[PATCH v5 19/26] hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT · Peter Xu <peterx@redhat.com> · 2021-07-15
Re: [PATCH v5 19/26] hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT · kernel test robot <hidden> · 2021-07-21
[PATCH v5 21/26] hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler · Peter Xu <peterx@redhat.com> · 2021-07-15
[PATCH v5 22/26] hugetlb/userfaultfd: Allow wr-protect none ptes · Peter Xu <peterx@redhat.com> · 2021-07-15
[PATCH v5 23/26] hugetlb/userfaultfd: Only drop uffd-wp special pte if required · Peter Xu <peterx@redhat.com> · 2021-07-15
[PATCH v5 26/26] userfaultfd/selftests: Enable uffd-wp for shmem/hugetlbfs · Peter Xu <peterx@redhat.com> · 2021-07-15
[PATCH v5 24/26] mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs · Peter Xu <peterx@redhat.com> · 2021-07-15
Re: [PATCH v5 24/26] mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs · Tiberiu Georgescu <hidden> · 2021-07-19
Re: [PATCH v5 24/26] mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs · Peter Xu <peterx@redhat.com> · 2021-07-19
Re: [PATCH v5 24/26] mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs · Tiberiu Georgescu <hidden> · 2021-07-19
Re: [PATCH v5 24/26] mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs · Peter Xu <peterx@redhat.com> · 2021-07-19
RE: [PATCH v5 24/26] mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs · Ivan Teterevkov <hidden> · 2021-07-21
Re: [PATCH v5 24/26] mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs · David Hildenbrand <hidden> · 2021-07-21
RE: [PATCH v5 24/26] mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs · Ivan Teterevkov <hidden> · 2021-07-21
Re: [PATCH v5 24/26] mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs · Peter Xu <peterx@redhat.com> · 2021-07-21
Re: [PATCH v5 24/26] mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs · Peter Xu <peterx@redhat.com> · 2021-07-21
Re: [PATCH v5 24/26] mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs · David Hildenbrand <hidden> · 2021-07-22
Re: [PATCH v5 24/26] mm/pagemap: Recognize uffd-wp bit for shmem/hugetlbfs · Peter Xu <peterx@redhat.com> · 2021-07-22
[PATCH v5 25/26] mm/userfaultfd: Enable write protection for shmem & hugetlbfs · Peter Xu <peterx@redhat.com> · 2021-07-15
[PATCH v5 11/26] shmem/userfaultfd: Allow wr-protect none pte for file-backed mem · Peter Xu <peterx@redhat.com> · 2021-07-15
[PATCH v5 13/26] shmem/userfaultfd: Handle the left-overed special swap ptes · Peter Xu <peterx@redhat.com> · 2021-07-15
Re: [PATCH v5 00/26] userfaultfd-wp: Support shmem and hugetlbfs · David Hildenbrand <hidden> · 2021-07-19
Re: [PATCH v5 00/26] userfaultfd-wp: Support shmem and hugetlbfs · Peter Xu <peterx@redhat.com> · 2021-07-19
Re: [PATCH v5 00/26] userfaultfd-wp: Support shmem and hugetlbfs · Peter Xu <peterx@redhat.com> · 2021-07-22

From: Alistair Popple <apopple@nvidia.com>
Date: 2021-07-16 05:51:26
Also in: lkml

Hi Peter,

[...]

quoted hunk ↗ jump to hunk

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ae1f5d0cb581..4b46c099ad94 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c

@@ -5738,7 +5738,7 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma,
 
 	if (pte_present(ptent))
 		page = mc_handle_present_pte(vma, addr, ptent);
-	else if (is_swap_pte(ptent))
+	else if (pte_has_swap_entry(ptent))
 		page = mc_handle_swap_pte(vma, ptent, &ent);
 	else if (pte_none(ptent))
 		page = mc_handle_file_pte(vma, addr, ptent, &ent);

As I understand things pte_none() == False for a special swap pte, but
shouldn't this be treated as pte_none() here? Ie. does this need to be
pte_none(ptent) || is_swap_special_pte() here?

quoted hunk ↗ jump to hunk

diff --git a/mm/memory.c b/mm/memory.c
index 0e0de08a2cd5..998a4f9a3744 100644
--- a/mm/memory.c
+++ b/mm/memory.c

@@ -3491,6 +3491,13 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
 	if (!pte_unmap_same(vmf))
 		goto out;
 
+	/*
+	 * We should never call do_swap_page upon a swap special pte; just be
+	 * safe to bail out if it happens.
+	 */
+	if (WARN_ON_ONCE(is_swap_special_pte(vmf->orig_pte)))
+		goto out;
+
 	entry = pte_to_swp_entry(vmf->orig_pte);
 	if (unlikely(non_swap_entry(entry))) {
 		if (is_migration_entry(entry)) {

Are there other changes required here? Because we can end up with stale special
pte's and a special pte is !pte_none don't we need to fix some of the !pte_none
checks in these functions:

insert_pfn() -> checks for !pte_none
remap_pte_range() -> BUG_ON(!pte_none)
apply_to_pte_range() -> didn't check further but it tests for !pte_none

In general it feels like I might be missing something here though. There are
plenty of checks in the kernel for pte_none() which haven't been updated. Is
there some rule that says none of those paths can see a special pte?

quoted hunk ↗ jump to hunk

diff --git a/mm/migrate.c b/mm/migrate.c
index 23cbd9de030b..b477d0d5f911 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c

@@ -294,7 +294,7 @@ void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
 
 	spin_lock(ptl);
 	pte = *ptep;
-	if (!is_swap_pte(pte))
+	if (!pte_has_swap_entry(pte))
 		goto out;
 
 	entry = pte_to_swp_entry(pte);

@@ -2276,7 +2276,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp,
 
 		pte = *ptep;
 
-		if (pte_none(pte)) {
+		if (pte_none(pte) || is_swap_special_pte(pte)) {

I was wondering if we can loose the special pte information here? However I see
that in migrate_vma_insert_page() we check again and fail the migration if
!pte_none() so I think this is ok.

I think it would be better if this check was moved below so the migration fails
early. Ie:

		if (pte_none(pte)) {
 			if (vma_is_anonymous(vma) && !is_swap_special_pte(pte)) {

Also how does this work for page migration in general? I can see in
page_vma_mapped_walk() that we skip special pte's, but doesn't this mean we
loose the special pte in that instance? Or is that ok for some reason?

quoted hunk ↗ jump to hunk

 			if (vma_is_anonymous(vma)) {
 				mpfn = MIGRATE_PFN_MIGRATE;
 				migrate->cpages++;

diff --git a/mm/mincore.c b/mm/mincore.c
index 9122676b54d6..5728c3e6473f 100644
--- a/mm/mincore.c
+++ b/mm/mincore.c

@@ -121,7 +121,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 	for (; addr != end; ptep++, addr += PAGE_SIZE) {
 		pte_t pte = *ptep;
 
-		if (pte_none(pte))
+		if (pte_none(pte) || is_swap_special_pte(pte))
 			__mincore_unmapped_range(addr, addr + PAGE_SIZE,
 						 vma, vec);
 		else if (pte_present(pte))

diff --git a/mm/mprotect.c b/mm/mprotect.c
index 883e2cc85cad..4b743394afbe 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c

@@ -139,7 +139,7 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 			}
 			ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent);
 			pages++;
-		} else if (is_swap_pte(oldpte)) {
+		} else if (pte_has_swap_entry(oldpte)) {
 			swp_entry_t entry = pte_to_swp_entry(oldpte);
 			pte_t newpte;

diff --git a/mm/mremap.c b/mm/mremap.c
index 5989d3990020..122b279333ee 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c

@@ -125,7 +125,7 @@ static pte_t move_soft_dirty_pte(pte_t pte)
 #ifdef CONFIG_MEM_SOFT_DIRTY
 	if (pte_present(pte))
 		pte = pte_mksoft_dirty(pte);
-	else if (is_swap_pte(pte))
+	else if (pte_has_swap_entry(pte))
 		pte = pte_swp_mksoft_dirty(pte);
 #endif
 	return pte;

diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index f7b331081791..ff57b67426af 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c

@@ -36,7 +36,7 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw)
 			 * For more details on device private memory see HMM
 			 * (include/linux/hmm.h or mm/hmm.c).
 			 */
-			if (is_swap_pte(*pvmw->pte)) {
+			if (pte_has_swap_entry(*pvmw->pte)) {
 				swp_entry_t entry;
 
 				/* Handle un-addressable ZONE_DEVICE memory */

@@ -90,7 +90,7 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw)
 
 	if (pvmw->flags & PVMW_MIGRATION) {
 		swp_entry_t entry;
-		if (!is_swap_pte(*pvmw->pte))
+		if (!pte_has_swap_entry(*pvmw->pte))
 			return false;
 		entry = pte_to_swp_entry(*pvmw->pte);

@@ -99,7 +99,7 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw)
 			return false;
 
 		pfn = swp_offset(entry);
-	} else if (is_swap_pte(*pvmw->pte)) {
+	} else if (pte_has_swap_entry(*pvmw->pte)) {
 		swp_entry_t entry;
 
 		/* Handle un-addressable ZONE_DEVICE memory */

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 1e07d1c776f2..4993b4454c13 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c

@@ -1951,7 +1951,7 @@ static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 	si = swap_info[type];
 	pte = pte_offset_map(pmd, addr);
 	do {
-		if (!is_swap_pte(*pte))
+		if (!pte_has_swap_entry(*pte))
 			continue;
 
 		entry = pte_to_swp_entry(*pte);

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help