Re: [PATCH v3 3/4] mm: FLEXIBLE_THP for improved performance | linux-arm-kernel

quoted

On Mon, Jul 17, 2023 at 1:31 PM Yu Zhao [off-list ref] wrote:
On Mon, Jul 17, 2023 at 7:36 AM Ryan Roberts [off-list ref] wrote:

+static int alloc_anon_folio(struct vm_fault *vmf, struct folio **folio)
+{
+       int i;
+       gfp_t gfp;
+       pte_t *pte;
+       unsigned long addr;
+       struct vm_area_struct *vma = vmf->vma;
+       int prefer = anon_folio_order(vma);
+       int orders[] = {
+               prefer,
+               prefer > PAGE_ALLOC_COSTLY_ORDER ? PAGE_ALLOC_COSTLY_ORDER : 0,
+               0,
+       };
+
+       *folio = NULL;
+
+       if (vmf_orig_pte_uffd_wp(vmf))
+               goto fallback;
+
+       for (i = 0; orders[i]; i++) {
+               addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << orders[i]);
+               if (addr >= vma->vm_start &&
+                   addr + (PAGE_SIZE << orders[i]) <= vma->vm_end)
+                       break;
+       }
+
+       if (!orders[i])
+               goto fallback;
+
+       pte = pte_offset_map(vmf->pmd, vmf->address & PMD_MASK);
+       if (!pte)
+               return -EAGAIN;
It would be a bug if this happens. So probably -EINVAL?
Not sure what you mean? Hugh Dickins' series that went into v6.5-rc1 makes it
possible for pte_offset_map() to fail (if I understood correctly) and we have to
handle this. The intent is that we will return from the fault without making any
change, then we will refault and try again.
Thanks for checking that -- it's very relevant. One detail is that
that series doesn't affect anon. IOW, collapsing PTEs into a PMD can't
happen while we are holding mmap_lock for read here, and therefore,
the race that could cause pte_offset_map() on shmem/file PTEs to fail
doesn't apply here.
But Hugh's patches have changed do_anonymous_page() to handle failure from
pte_offset_map_lock(). So I was just following that pattern. If this really
can't happen, then I'd rather WARN/BUG on it, and simplify alloc_anon_folio()'s
prototype to just return a `struct folio *` (and if it's null that means ENOMEM).

Hugh, perhaps you can comment?

As an aside, it was my understanding from LWN, that we are now using a per-VMA
lock so presumably we don't hold mmap_lock for read here? Or perhaps that only
applies to file-backed memory?
For anon under mmap_lock for read:
1. pte_offset_map[_lock]() fails when a parallel PF changes PMD from
none to leaf.
2. changing PMD from non-leaf to leaf is a bug. See the comments in
the "else" branch in handle_pte_fault().

So for do_anonymous_page(), there is only one case
pte_offset_map[_lock]() can fail.
===
For the code above, this case was
ruled out by vmf_orig_pte_uffd_wp().
Actually I was wrong about this part.
===

Checking the return value from pte_offset_map[_lock]() is a good
practice. What I'm saying is that -EAGAIN would mislead people to
think, in our case, !pte is legitimate, and hence the suggestion of
replacing it with -EINVAL.
Yes, -EAGAIN is suitable.

No BUG_ON() please. As I've previously mentioned, it's against
Documentation/process/coding-style.rst.

+Hugh Dickins for further consultation if you need it.

+
+       for (; orders[i]; i++) {
+               addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << orders[i]);
+               vmf->pte = pte + pte_index(addr);
+               if (!vmf_pte_range_changed(vmf, 1 << orders[i]))
+                       break;
+       }
+
+       vmf->pte = NULL;
+       pte_unmap(pte);
+
+       gfp = vma_thp_gfp_mask(vma);
+
+       for (; orders[i]; i++) {
+               addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << orders[i]);
+               *folio = vma_alloc_folio(gfp, orders[i], vma, addr, true);
+               if (*folio) {
+                       clear_huge_page(&(*folio)->page, addr, 1 << orders[i]);
+                       return 0;
+               }
+       }
+
+fallback:
+       *folio = vma_alloc_zeroed_movable_folio(vma, vmf->address);
+       return *folio ? 0 : -ENOMEM;
+}
+#else
+static inline int alloc_anon_folio(struct vm_fault *vmf, struct folio **folio)
Drop "inline" (it doesn't do anything in .c).
There are 38 instances of inline in memory.c alone, so looks like a well used
convention, even if the compiler may choose to ignore. Perhaps you can educate
me; what's the benefit of dropping it?
I'll let Willy and Andrew educate both of us :)

+Matthew Wilcox +Andrew Morton please. Thank you.

The rest looks good to me.
Great - just incase it wasn't obvious, I decided not to overwrite vmf->address
with the aligned version, as you suggested
Yes, I've noticed. Not overwriting has its own merits for sure.

for 2 reasons; 1) address is const
in the struct, so would have had to change that. 2) there is a uffd path that
can be taken after the vmf->address fixup would have occured and the path
consumes that member, so it would have had to be un-fixed-up making it more
messy than the way I opted for.

Thanks for the quick review as always!
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help