Re: [PATCH v3 3/4] mm: FLEXIBLE_THP for improved performance
From: Yu Zhao <hidden>
Date: 2023-07-17 20:36:29
Also in:
linux-mm, lkml
On Mon, Jul 17, 2023 at 1:31 PM Yu Zhao [off-list ref] wrote:
On Mon, Jul 17, 2023 at 7:36 AM Ryan Roberts [off-list ref] wrote:quoted
quoted
quoted
quoted
quoted
+static int alloc_anon_folio(struct vm_fault *vmf, struct folio **folio) +{ + int i; + gfp_t gfp; + pte_t *pte; + unsigned long addr; + struct vm_area_struct *vma = vmf->vma; + int prefer = anon_folio_order(vma); + int orders[] = { + prefer, + prefer > PAGE_ALLOC_COSTLY_ORDER ? PAGE_ALLOC_COSTLY_ORDER : 0, + 0, + }; + + *folio = NULL; + + if (vmf_orig_pte_uffd_wp(vmf)) + goto fallback; + + for (i = 0; orders[i]; i++) { + addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << orders[i]); + if (addr >= vma->vm_start && + addr + (PAGE_SIZE << orders[i]) <= vma->vm_end) + break; + } + + if (!orders[i]) + goto fallback; + + pte = pte_offset_map(vmf->pmd, vmf->address & PMD_MASK); + if (!pte) + return -EAGAIN;It would be a bug if this happens. So probably -EINVAL?Not sure what you mean? Hugh Dickins' series that went into v6.5-rc1 makes it possible for pte_offset_map() to fail (if I understood correctly) and we have to handle this. The intent is that we will return from the fault without making any change, then we will refault and try again.Thanks for checking that -- it's very relevant. One detail is that that series doesn't affect anon. IOW, collapsing PTEs into a PMD can't happen while we are holding mmap_lock for read here, and therefore, the race that could cause pte_offset_map() on shmem/file PTEs to fail doesn't apply here.But Hugh's patches have changed do_anonymous_page() to handle failure from pte_offset_map_lock(). So I was just following that pattern. If this really can't happen, then I'd rather WARN/BUG on it, and simplify alloc_anon_folio()'s prototype to just return a `struct folio *` (and if it's null that means ENOMEM). Hugh, perhaps you can comment? As an aside, it was my understanding from LWN, that we are now using a per-VMA lock so presumably we don't hold mmap_lock for read here? Or perhaps that only applies to file-backed memory?For anon under mmap_lock for read: 1. pte_offset_map[_lock]() fails when a parallel PF changes PMD from none to leaf. 2. changing PMD from non-leaf to leaf is a bug. See the comments in the "else" branch in handle_pte_fault(). So for do_anonymous_page(), there is only one case pte_offset_map[_lock]() can fail.
===
For the code above, this case was ruled out by vmf_orig_pte_uffd_wp().
Actually I was wrong about this part. ===
Checking the return value from pte_offset_map[_lock]() is a good practice. What I'm saying is that -EAGAIN would mislead people to think, in our case, !pte is legitimate, and hence the suggestion of replacing it with -EINVAL.
Yes, -EAGAIN is suitable.
No BUG_ON() please. As I've previously mentioned, it's against Documentation/process/coding-style.rst.quoted
quoted
+Hugh Dickins for further consultation if you need it.quoted
quoted
quoted
+ + for (; orders[i]; i++) { + addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << orders[i]); + vmf->pte = pte + pte_index(addr); + if (!vmf_pte_range_changed(vmf, 1 << orders[i])) + break; + } + + vmf->pte = NULL; + pte_unmap(pte); + + gfp = vma_thp_gfp_mask(vma); + + for (; orders[i]; i++) { + addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << orders[i]); + *folio = vma_alloc_folio(gfp, orders[i], vma, addr, true); + if (*folio) { + clear_huge_page(&(*folio)->page, addr, 1 << orders[i]); + return 0; + } + } + +fallback: + *folio = vma_alloc_zeroed_movable_folio(vma, vmf->address); + return *folio ? 0 : -ENOMEM; +} +#else +static inline int alloc_anon_folio(struct vm_fault *vmf, struct folio **folio)Drop "inline" (it doesn't do anything in .c).There are 38 instances of inline in memory.c alone, so looks like a well used convention, even if the compiler may choose to ignore. Perhaps you can educate me; what's the benefit of dropping it?I'll let Willy and Andrew educate both of us :) +Matthew Wilcox +Andrew Morton please. Thank you.quoted
quoted
The rest looks good to me.Great - just incase it wasn't obvious, I decided not to overwrite vmf->address with the aligned version, as you suggestedYes, I've noticed. Not overwriting has its own merits for sure.quoted
for 2 reasons; 1) address is const in the struct, so would have had to change that. 2) there is a uffd path that can be taken after the vmf->address fixup would have occured and the path consumes that member, so it would have had to be un-fixed-up making it more messy than the way I opted for. Thanks for the quick review as always!
_______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel