Re: [External] Re: [PATCH v3 00/21] Free some vmemmap pages of hugetlb page

[PATCH v3 00/21] Free some vmemmap pages of hugetlb page · Muchun Song <hidden> · 2020-11-08
[PATCH v3 01/21] mm/memory_hotplug: Move bootmem info registration API to bootmem_info.c · Muchun Song <hidden> · 2020-11-08
[PATCH v3 02/21] mm/memory_hotplug: Move {get,put}_page_bootmem() to bootmem_info.c · Muchun Song <hidden> · 2020-11-08
[PATCH v3 03/21] mm/hugetlb: Introduce a new config HUGETLB_PAGE_FREE_VMEMMAP · Muchun Song <hidden> · 2020-11-08
Re: [PATCH v3 03/21] mm/hugetlb: Introduce a new config HUGETLB_PAGE_FREE_VMEMMAP · Oscar Salvador <osalvador@suse.de> · 2020-11-09
Re: [External] Re: [PATCH v3 03/21] mm/hugetlb: Introduce a new config HUGETLB_PAGE_FREE_VMEMMAP · Muchun Song <hidden> · 2020-11-09
Re: [PATCH v3 03/21] mm/hugetlb: Introduce a new config HUGETLB_PAGE_FREE_VMEMMAP · Mike Kravetz <hidden> · 2020-11-10
Re: [PATCH v3 03/21] mm/hugetlb: Introduce a new config HUGETLB_PAGE_FREE_VMEMMAP · Matthew Wilcox <willy@infradead.org> · 2020-11-10
Re: [PATCH v3 03/21] mm/hugetlb: Introduce a new config HUGETLB_PAGE_FREE_VMEMMAP · Mike Kravetz <hidden> · 2020-11-10
Re: [External] Re: [PATCH v3 03/21] mm/hugetlb: Introduce a new config HUGETLB_PAGE_FREE_VMEMMAP · Muchun Song <hidden> · 2020-11-17
Re: [External] Re: [PATCH v3 03/21] mm/hugetlb: Introduce a new config HUGETLB_PAGE_FREE_VMEMMAP · Muchun Song <hidden> · 2020-11-11
[PATCH v3 04/21] mm/hugetlb: Introduce nr_free_vmemmap_pages in the struct hstate · Muchun Song <hidden> · 2020-11-08
Re: [PATCH v3 04/21] mm/hugetlb: Introduce nr_free_vmemmap_pages in the struct hstate · Oscar Salvador <osalvador@suse.de> · 2020-11-09
Re: [External] Re: [PATCH v3 04/21] mm/hugetlb: Introduce nr_free_vmemmap_pages in the struct hstate · Muchun Song <hidden> · 2020-11-10
Re: [External] Re: [PATCH v3 04/21] mm/hugetlb: Introduce nr_free_vmemmap_pages in the struct hstate · Mike Kravetz <hidden> · 2020-11-10
Re: [External] Re: [PATCH v3 04/21] mm/hugetlb: Introduce nr_free_vmemmap_pages in the struct hstate · Muchun Song <hidden> · 2020-11-11
[PATCH v3 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers · Muchun Song <hidden> · 2020-11-08
Re: [PATCH v3 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers · Oscar Salvador <osalvador@suse.de> · 2020-11-09
Re: [External] Re: [PATCH v3 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers · Muchun Song <hidden> · 2020-11-10
Re: [External] Re: [PATCH v3 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers · Oscar Salvador <osalvador@suse.de> · 2020-11-10
Re: [External] Re: [PATCH v3 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers · Muchun Song <hidden> · 2020-11-10
Re: [External] Re: [PATCH v3 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers · Oscar Salvador <osalvador@suse.de> · 2020-11-10
Re: [External] Re: [PATCH v3 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers · Muchun Song <hidden> · 2020-11-10
Re: [PATCH v3 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers · Mike Kravetz <hidden> · 2020-11-11
Re: [External] Re: [PATCH v3 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers · Muchun Song <hidden> · 2020-11-11
Re: [External] Re: [PATCH v3 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers · Mike Kravetz <hidden> · 2020-11-13
Re: [External] Re: [PATCH v3 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers · Mike Kravetz <hidden> · 2020-11-13
Re: [External] Re: [PATCH v3 05/21] mm/hugetlb: Introduce pgtable allocation/freeing helpers · Muchun Song <hidden> · 2020-11-13
[PATCH v3 06/21] mm/bootmem_info: Introduce {free,prepare}_vmemmap_page() · Muchun Song <hidden> · 2020-11-08
[PATCH v3 07/21] mm/bootmem_info: Combine bootmem info and type into page->freelist · Muchun Song <hidden> · 2020-11-08
[PATCH v3 08/21] mm/vmemmap: Initialize page table lock for vmemmap · Muchun Song <hidden> · 2020-11-08
Re: [PATCH v3 08/21] mm/vmemmap: Initialize page table lock for vmemmap · Oscar Salvador <osalvador@suse.de> · 2020-11-09
Re: [External] Re: [PATCH v3 08/21] mm/vmemmap: Initialize page table lock for vmemmap · Muchun Song <hidden> · 2020-11-10
[PATCH v3 09/21] mm/hugetlb: Free the vmemmap pages associated with each hugetlb page · Muchun Song <hidden> · 2020-11-08
Re: [PATCH v3 09/21] mm/hugetlb: Free the vmemmap pages associated with each hugetlb page · Oscar Salvador <osalvador@suse.de> · 2020-11-09
Re: [External] Re: [PATCH v3 09/21] mm/hugetlb: Free the vmemmap pages associated with each hugetlb page · Muchun Song <hidden> · 2020-11-10
Re: [External] Re: [PATCH v3 09/21] mm/hugetlb: Free the vmemmap pages associated with each hugetlb page · Oscar Salvador <osalvador@suse.de> · 2020-11-10
Re: [External] Re: [PATCH v3 09/21] mm/hugetlb: Free the vmemmap pages associated with each hugetlb page · Muchun Song <hidden> · 2020-11-10
Re: [External] Re: [PATCH v3 09/21] mm/hugetlb: Free the vmemmap pages associated with each hugetlb page · Oscar Salvador <osalvador@suse.de> · 2020-11-10
Re: [External] Re: [PATCH v3 09/21] mm/hugetlb: Free the vmemmap pages associated with each hugetlb page · Muchun Song <hidden> · 2020-11-10
[PATCH v3 10/21] mm/hugetlb: Defer freeing of hugetlb pages · Muchun Song <hidden> · 2020-11-08
[PATCH v3 11/21] mm/hugetlb: Allocate the vmemmap pages associated with each hugetlb page · Muchun Song <hidden> · 2020-11-08
[PATCH v3 12/21] mm/hugetlb: Introduce remap_huge_page_pmd_vmemmap helper · Muchun Song <hidden> · 2020-11-08
[PATCH v3 13/21] mm/hugetlb: Use PG_slab to indicate split pmd · Muchun Song <hidden> · 2020-11-08
[PATCH v3 14/21] mm/hugetlb: Support freeing vmemmap pages of gigantic page · Muchun Song <hidden> · 2020-11-08
[PATCH v3 15/21] mm/hugetlb: Add a BUILD_BUG_ON to check if struct page size is a power of two · Muchun Song <hidden> · 2020-11-08
[PATCH v3 16/21] mm/hugetlb: Set the PageHWPoison to the raw error page · Muchun Song <hidden> · 2020-11-08
[PATCH v3 17/21] mm/hugetlb: Flush work when dissolving hugetlb page · Muchun Song <hidden> · 2020-11-08
[PATCH v3 18/21] mm/hugetlb: Add a kernel parameter hugetlb_free_vmemmap · Muchun Song <hidden> · 2020-11-08
[PATCH v3 19/21] mm/hugetlb: Merge pte to huge pmd only for gigantic page · Muchun Song <hidden> · 2020-11-08
[PATCH v3 20/21] mm/hugetlb: Gather discrete indexes of tail page · Muchun Song <hidden> · 2020-11-08
[PATCH v3 21/21] mm/hugetlb: Add BUILD_BUG_ON to catch invalid usage of tail struct page · Muchun Song <hidden> · 2020-11-08
Re: [PATCH v3 00/21] Free some vmemmap pages of hugetlb page · Mike Kravetz <hidden> · 2020-11-10
Re: [External] Re: [PATCH v3 00/21] Free some vmemmap pages of hugetlb page · Muchun Song <hidden> · 2020-11-11

From: Muchun Song <hidden>
Date: 2020-11-11 03:21:50
Also in: linux-fsdevel, linux-mm, lkml

On Wed, Nov 11, 2020 at 3:23 AM Mike Kravetz [off-list ref] wrote:


Thanks for continuing to work this Muchun!

On 11/8/20 6:10 AM, Muchun Song wrote:
...

quoted

For tail pages, the value of compound_head is the same. So we can reuse
first page of tail page structs. We map the virtual addresses of the
remaining 6 pages of tail page structs to the first tail page struct,
and then free these 6 pages. Therefore, we need to reserve at least 2
pages as vmemmap areas.

When a hugetlbpage is freed to the buddy system, we should allocate six
pages for vmemmap pages and restore the previous mapping relationship.

If we uses the 1G hugetlbpage, we can save 4095 pages. This is a very
substantial gain.

Is that 4095 number accurate?  Are we not using two pages of struct pages
as in the 2MB case?

Oh, yeah, here should be 4094 and subtract page tables. For a 1GB
HugeTLB page, it should be 4086 pages. Thanks for pointing out
this problem.

Also, because we are splitting the huge page mappings in the vmemmap
additional PTE pages will need to be allocated.  Therefore, some additional
page table pages may need to be allocated so that we can free the pages
of struct pages.  The net savings may be less than what is stated above.

Perhaps this should mention that allocation of additional page table pages
may be required?

Yeah, you are right. In the later patch, I will rework the analysis
here. Make it
more clear and accurate.

...

quoted

Because there are vmemmap page tables reconstruction on the freeing/allocating
path, it increases some overhead. Here are some overhead analysis.

1) Allocating 10240 2MB hugetlb pages.

   a) With this patch series applied:
   # time echo 10240 > /proc/sys/vm/nr_hugepages

   real     0m0.166s
   user     0m0.000s
   sys      0m0.166s

   # bpftrace -e 'kprobe:alloc_fresh_huge_page { @start[tid] = nsecs; } kretprobe:alloc_fresh_huge_page /@start[tid]/ { @latency = hist(nsecs - @start[tid]); delete(@start[tid]); }'
   Attaching 2 probes...

   @latency:
   [8K, 16K)           8360 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
   [16K, 32K)          1868 |@@@@@@@@@@@                                         |
   [32K, 64K)            10 |                                                    |
   [64K, 128K)            2 |                                                    |

   b) Without this patch series:
   # time echo 10240 > /proc/sys/vm/nr_hugepages

   real     0m0.066s
   user     0m0.000s
   sys      0m0.066s

   # bpftrace -e 'kprobe:alloc_fresh_huge_page { @start[tid] = nsecs; } kretprobe:alloc_fresh_huge_page /@start[tid]/ { @latency = hist(nsecs - @start[tid]); delete(@start[tid]); }'
   Attaching 2 probes...

   @latency:
   [4K, 8K)           10176 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
   [8K, 16K)             62 |                                                    |
   [16K, 32K)             2 |                                                    |

   Summarize: this feature is about ~2x slower than before.

2) Freeing 10240 @MB hugetlb pages.

   a) With this patch series applied:
   # time echo 0 > /proc/sys/vm/nr_hugepages

   real     0m0.004s
   user     0m0.000s
   sys      0m0.002s

   # bpftrace -e 'kprobe:__free_hugepage { @start[tid] = nsecs; } kretprobe:__free_hugepage /@start[tid]/ { @latency = hist(nsecs - @start[tid]); delete(@start[tid]); }'
   Attaching 2 probes...

   @latency:
   [16K, 32K)         10240 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|

   b) Without this patch series:
   # time echo 0 > /proc/sys/vm/nr_hugepages

   real     0m0.077s
   user     0m0.001s
   sys      0m0.075s

   # bpftrace -e 'kprobe:__free_hugepage { @start[tid] = nsecs; } kretprobe:__free_hugepage /@start[tid]/ { @latency = hist(nsecs - @start[tid]); delete(@start[tid]); }'
   Attaching 2 probes...

   @latency:
   [4K, 8K)            9950 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
   [8K, 16K)            287 |@                                                   |
   [16K, 32K)             3 |                                                    |

   Summarize: The overhead of __free_hugepage is about ~2-4x slower than before.
              But according to the allocation test above, I think that here is
            also ~2x slower than before.

              But why the 'real' time of patched is smaller than before? Because
            In this patch series, the freeing hugetlb is asynchronous(through
            kwoker).

Although the overhead has increased. But the overhead is not on the
allocating/freeing of each hugetlb page, it is only once when we reserve
some hugetlb pages through /proc/sys/vm/nr_hugepages. Once the reservation
is successful, the subsequent allocating, freeing and using are the same
as before (not patched). So I think that the overhead is acceptable.

Thank you for benchmarking.  There are still some instances where huge pages
are allocated 'on the fly' instead of being pulled from the pool.  Michal
pointed out the case of page migration.  It is also possible for someone to
use hugetlbfs without pre-allocating huge pages to the pool.  I remember the
use case pointed out in commit 099730d67417.  It says, "I have a hugetlbfs
user which is never explicitly allocating huge pages with 'nr_hugepages'.
They only set 'nr_overcommit_hugepages' and then let the pages be allocated
from the buddy allocator at fault time."  In this case, I suspect they were
using 'page fault' allocation for initialization much like someone using
/proc/sys/vm/nr_hugepages.  So, the overhead may not be as noticeable.

Thanks for pointing out this using case.

--
Mike Kravetz



-- 
Yours,
Muchun

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help