Thread (29 messages) 29 messages, 5 authors, 2024-10-28

Re: [PATCH v3 1/5] mm: pagewalk: add the ability to install PTEs

From: Vlastimil Babka <hidden>
Date: 2024-10-25 18:13:28
Also in: linux-alpha, linux-arch, linux-kselftest, linux-mips, linux-mm, lkml

On 10/23/24 18:24, Lorenzo Stoakes wrote:
The existing generic pagewalk logic permits the walking of page tables,
invoking callbacks at individual page table levels via user-provided
mm_walk_ops callbacks.

This is useful for traversing existing page table entries, but precludes
the ability to establish new ones.

Existing mechanism for performing a walk which also installs page table
entries if necessary are heavily duplicated throughout the kernel, each
with semantic differences from one another and largely unavailable for use
elsewhere.

Rather than add yet another implementation, we extend the generic pagewalk
logic to enable the installation of page table entries by adding a new
install_pte() callback in mm_walk_ops. If this is specified, then upon
encountering a missing page table entry, we allocate and install a new one
and continue the traversal.

If a THP huge page is encountered at either the PMD or PUD level we split
it only if there are ops->pte_entry() (or ops->pmd_entry at PUD level),
otherwise if there is only an ops->install_pte(), we avoid the unnecessary
split.

We do not support hugetlb at this stage.

If this function returns an error, or an allocation fails during the
operation, we abort the operation altogether. It is up to the caller to
deal appropriately with partially populated page table ranges.

If install_pte() is defined, the semantics of pte_entry() change - this
callback is then only invoked if the entry already exists. This is a useful
property, as it allows a caller to handle existing PTEs while installing
new ones where necessary in the specified range.

If install_pte() is not defined, then there is no functional difference to
this patch, so all existing logic will work precisely as it did before.

As we only permit the installation of PTEs where a mapping does not already
exist there is no need for TLB management, however we do invoke
update_mmu_cache() for architectures which require manual maintenance of
mappings for other CPUs.

We explicitly do not allow the existing page walk API to expose this
feature as it is dangerous and intended for internal mm use only. Therefore
we provide a new walk_page_range_mm() function exposed only to
mm/internal.h.

Reviewed-by: Jann Horn <jannh@google.com>
Signed-off-by: Lorenzo Stoakes <redacted>
Reviewed-by: Vlastimil Babka <redacted>

Just a small subjective suggestion in case you agree and there's a respin or
followups:
quoted hunk ↗ jump to hunk
@@ -109,18 +131,19 @@ static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned long end,
 
 		if (walk->action == ACTION_AGAIN)
 			goto again;
-
-		/*
-		 * Check this here so we only break down trans_huge
-		 * pages when we _need_ to
-		 */
-		if ((!walk->vma && (pmd_leaf(*pmd) || !pmd_present(*pmd))) ||
-		    walk->action == ACTION_CONTINUE ||
-		    !(ops->pte_entry))
+		if (walk->action == ACTION_CONTINUE)
 			continue;
+		if (!ops->install_pte && !ops->pte_entry)
+			continue; /* Nothing to do. */
+		if (!ops->pte_entry && ops->install_pte &&
+		    pmd_present(*pmd) &&
+		    (pmd_trans_huge(*pmd) || pmd_devmap(*pmd)))
+			continue; /* Avoid unnecessary split. */
Much better now, thanks, but maybe the last 2 parts could be:

if (!ops->pte_entry) {
	if (!ops->install_pte)
		continue; /* Nothing to do. */
	else if (pmd_present(*pmd)
		 && (pmd_trans_huge(*pmd) || pmd_devmap(*pmd)))
		continue; /* Avoid unnecessary split. */
}

Or at least put !ops->pte_entry first in both conditions?
quoted hunk ↗ jump to hunk
 		if (walk->vma)
 			split_huge_pmd(walk->vma, pmd, addr);
+		else if (pmd_leaf(*pmd) || !pmd_present(*pmd))
+			continue; /* Nothing to do. */
 
 		err = walk_pte_range(pmd, addr, next, walk);
 		if (err)
@@ -148,11 +171,14 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
  again:
 		next = pud_addr_end(addr, end);
 		if (pud_none(*pud)) {
-			if (ops->pte_hole)
+			if (ops->install_pte)
+				err = __pmd_alloc(walk->mm, pud, addr);
+			else if (ops->pte_hole)
 				err = ops->pte_hole(addr, next, depth, walk);
 			if (err)
 				break;
-			continue;
+			if (!ops->install_pte)
+				continue;
 		}
 
 		walk->action = ACTION_SUBTREE;
@@ -164,14 +190,20 @@ static int walk_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end,
 
 		if (walk->action == ACTION_AGAIN)
 			goto again;
-
-		if ((!walk->vma && (pud_leaf(*pud) || !pud_present(*pud))) ||
-		    walk->action == ACTION_CONTINUE ||
-		    !(ops->pmd_entry || ops->pte_entry))
+		if (walk->action == ACTION_CONTINUE)
 			continue;
+		if (!ops->install_pte && !ops->pte_entry && !ops->pmd_entry)
+			continue;  /* Nothing to do. */
+		if (!ops->pmd_entry && !ops->pte_entry && ops->install_pte &&
+		    pud_present(*pud) &&
+		    (pud_trans_huge(*pud) || pud_devmap(*pud)))
+			continue; /* Avoid unnecessary split. */
Ditto.

Thanks!
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help