Thread (54 messages) 54 messages, 4 authors, 2020-11-17

Re: [External] Re: [PATCH v3 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers

From: Muchun Song <hidden>
Date: 2020-11-11 03:42:20
Also in: linux-fsdevel, linux-mm, lkml

On Wed, Nov 11, 2020 at 8:47 AM Mike Kravetz [off-list ref] wrote:
On 11/8/20 6:10 AM, Muchun Song wrote:
quoted
On x86_64, vmemmap is always PMD mapped if the machine has hugepages
support and if we have 2MB contiguos pages and PMD aligned. If we want
to free the unused vmemmap pages, we have to split the huge pmd firstly.
So we should pre-allocate pgtable to split PMD to PTE.
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index a0007902fafb..5c7be2ee7e15 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1303,6 +1303,108 @@ static inline void destroy_compound_gigantic_page(struct page *page,
...
quoted
+static inline void vmemmap_pgtable_init(struct page *page)
+{
+     page_huge_pte(page) = NULL;
+}
+
+static void vmemmap_pgtable_deposit(struct page *page, pgtable_t pgtable)
+{
+     /* FIFO */
+     if (!page_huge_pte(page))
+             INIT_LIST_HEAD(&pgtable->lru);
+     else
+             list_add(&pgtable->lru, &page_huge_pte(page)->lru);
+     page_huge_pte(page) = pgtable;
+}
+
+static pgtable_t vmemmap_pgtable_withdraw(struct page *page)
+{
+     pgtable_t pgtable;
+
+     /* FIFO */
+     pgtable = page_huge_pte(page);
+     page_huge_pte(page) = list_first_entry_or_null(&pgtable->lru,
+                                                    struct page, lru);
+     if (page_huge_pte(page))
+             list_del(&pgtable->lru);
+     return pgtable;
+}
+
+static int vmemmap_pgtable_prealloc(struct hstate *h, struct page *page)
+{
+     int i;
+     pgtable_t pgtable;
+     unsigned int nr = pgtable_pages_to_prealloc_per_hpage(h);
+
+     if (!nr)
+             return 0;
+
+     vmemmap_pgtable_init(page);
+
+     for (i = 0; i < nr; i++) {
+             pte_t *pte_p;
+
+             pte_p = pte_alloc_one_kernel(&init_mm);
+             if (!pte_p)
+                     goto out;
+             vmemmap_pgtable_deposit(page, virt_to_page(pte_p));
+     }
+
+     return 0;
+out:
+     while (i-- && (pgtable = vmemmap_pgtable_withdraw(page)))
+             pte_free_kernel(&init_mm, page_to_virt(pgtable));
+     return -ENOMEM;
+}
+
+static void vmemmap_pgtable_free(struct hstate *h, struct page *page)
+{
+     pgtable_t pgtable;
+     unsigned int nr = pgtable_pages_to_prealloc_per_hpage(h);
+
+     if (!nr)
+             return;
+
+     pgtable = page_huge_pte(page);
+     if (!pgtable)
+             return;
+
+     while (nr-- && (pgtable = vmemmap_pgtable_withdraw(page)))
+             pte_free_kernel(&init_mm, page_to_virt(pgtable));
+}
I may be confused.

In patch 9 of this series, the first call to vmemmap_pgtable_free() is made:
quoted
@@ -1645,6 +1799,10 @@ void free_huge_page(struct page *page)

 static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
 {
+     free_huge_page_vmemmap(h, page);
+     /* Must be called before the initialization of @page->lru */
+     vmemmap_pgtable_free(h, page);
+
      INIT_LIST_HEAD(&page->lru);
      set_compound_page_dtor(page, HUGETLB_PAGE_DTOR);
      set_hugetlb_cgroup(page, NULL);
When I saw that comment in previous patch series, I assumed page->lru was
being used to store preallocated pages and pages to free.  However, unless
Yeah, you are right.
I am reading the code incorrectly it does not appear page->lru (of the huge
page) is being used for this purpose.  Is that correct?

If it is correct, would using page->lru of the huge page make this code
simpler?  I am just missing the reason why you are using
page_huge_pte(page)->lru
For 1GB HugeTLB pages, we should pre-allocate more than one page
table. So I use a linked list. The page_huge_pte(page) is the list head.
Because the page->lru shares storage with page->pmd_huge_pte.

+     /* Must be called before the initialization of @page->lru */
+     vmemmap_pgtable_free(h, page);
+
       INIT_LIST_HEAD(&page->lru);

Here we initialize the lru. So the vmemmap_pgtable_free should
be called before this. It seems like that I should point out this "share"
in the comment.

Thanks.
--
Mike Kravetz


-- 
Yours,
Muchun
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help