Thread (62 messages) 62 messages, 7 authors, 2021-02-02

Re: [External] Re: [PATCH v13 05/12] mm: hugetlb: allocate the vmemmap pages associated with each HugeTLB page

From: David Hildenbrand <hidden>
Date: 2021-01-26 20:16:17
Also in: linux-fsdevel, linux-mm, lkml

On 25.01.21 08:41, Muchun Song wrote:
On Mon, Jan 25, 2021 at 2:40 PM Muchun Song [off-list ref] wrote:
quoted
On Mon, Jan 25, 2021 at 8:05 AM David Rientjes [off-list ref] wrote:
quoted

On Sun, 17 Jan 2021, Muchun Song wrote:
quoted
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index ce4be1fa93c2..3b146d5949f3 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -29,6 +29,7 @@
 #include <linux/sched.h>
 #include <linux/pgtable.h>
 #include <linux/bootmem_info.h>
+#include <linux/delay.h>

 #include <asm/dma.h>
 #include <asm/pgalloc.h>
@@ -40,7 +41,8 @@
  * @remap_pte:               called for each non-empty PTE (lowest-level) entry.
  * @reuse_page:              the page which is reused for the tail vmemmap pages.
  * @reuse_addr:              the virtual address of the @reuse_page page.
- * @vmemmap_pages:   the list head of the vmemmap pages that can be freed.
+ * @vmemmap_pages:   the list head of the vmemmap pages that can be freed
+ *                   or is mapped from.
  */
 struct vmemmap_remap_walk {
      void (*remap_pte)(pte_t *pte, unsigned long addr,
@@ -50,6 +52,10 @@ struct vmemmap_remap_walk {
      struct list_head *vmemmap_pages;
 };

+/* The gfp mask of allocating vmemmap page */
+#define GFP_VMEMMAP_PAGE             \
+     (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN | __GFP_THISNODE)
+
This is unnecessary, just use the gfp mask directly in allocator.
Will do. Thanks.
quoted
quoted
 static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr,
                            unsigned long end,
                            struct vmemmap_remap_walk *walk)
@@ -228,6 +234,75 @@ void vmemmap_remap_free(unsigned long start, unsigned long end,
      free_vmemmap_page_list(&vmemmap_pages);
 }

+static void vmemmap_restore_pte(pte_t *pte, unsigned long addr,
+                             struct vmemmap_remap_walk *walk)
+{
+     pgprot_t pgprot = PAGE_KERNEL;
+     struct page *page;
+     void *to;
+
+     BUG_ON(pte_page(*pte) != walk->reuse_page);
+
+     page = list_first_entry(walk->vmemmap_pages, struct page, lru);
+     list_del(&page->lru);
+     to = page_to_virt(page);
+     copy_page(to, (void *)walk->reuse_addr);
+
+     set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot));
+}
+
+static void alloc_vmemmap_page_list(struct list_head *list,
+                                 unsigned long start, unsigned long end)
+{
+     unsigned long addr;
+
+     for (addr = start; addr < end; addr += PAGE_SIZE) {
+             struct page *page;
+             int nid = page_to_nid((const void *)addr);
+
+retry:
+             page = alloc_pages_node(nid, GFP_VMEMMAP_PAGE, 0);
+             if (unlikely(!page)) {
+                     msleep(100);
+                     /*
+                      * We should retry infinitely, because we cannot
+                      * handle allocation failures. Once we allocate
+                      * vmemmap pages successfully, then we can free
+                      * a HugeTLB page.
+                      */
+                     goto retry;
Ugh, I don't think this will work, there's no guarantee that we'll ever
succeed and now we can't free a 2MB hugepage because we cannot allocate a
4KB page.  We absolutely have to ensure we make forward progress here.
This can trigger a OOM when there is no memory and kill someone to release
some memory. Right?
quoted
We're going to be freeing the hugetlb page after this succeeeds, can we
not use part of the hugetlb page that we're freeing for this memory
instead?
It seems a good idea. We can try to allocate memory firstly, if successful,
just use the new page to remap (it can reduce memory fragmentation).
If not, we can use part of the hugetlb page to remap. What's your opinion
about this?
If the HugeTLB page is a gigantic page which is allocated from
CMA. In this case, we cannot use part of the hugetlb page to remap.
Right?
Right; and I don't think the "reuse part of a huge page as vmemmap while
freeing, while that part itself might not have a proper vmemmap yet (or
might cover itself now)" is particularly straight forward. Maybe I'm
wrong :)

Also, watch out for huge pages on ZONE_MOVABLE, in that case you also
shouldn't allocate the vmemmap from there ...

-- 
Thanks,

David / dhildenb
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help