Re: [PATCH RFC 0/9] mm, sparse-vmemmap: Introduce compound pagemaps

[PATCH RFC 0/9] mm, sparse-vmemmap: Introduce compound pagemaps · Joao Martins <hidden> · 2020-12-08
[PATCH RFC 3/9] sparse-vmemmap: Reuse vmemmap areas for a given mhp_params::align · Joao Martins <hidden> · 2020-12-08
Re: [PATCH RFC 3/9] sparse-vmemmap: Reuse vmemmap areas for a given mhp_params::align · Joao Martins <hidden> · 2020-12-08
[PATCH RFC 4/9] mm/page_alloc: Reuse tail struct pages for compound pagemaps · Joao Martins <hidden> · 2020-12-08
Re: [PATCH RFC 4/9] mm/page_alloc: Reuse tail struct pages for compound pagemaps · Dan Williams <hidden> · 2021-02-20
Re: [PATCH RFC 4/9] mm/page_alloc: Reuse tail struct pages for compound pagemaps · Joao Martins <hidden> · 2021-02-22
[PATCH RFC 5/9] device-dax: Compound pagemap support · Joao Martins <hidden> · 2020-12-08
[PATCH RFC 6/9] mm/gup: Grab head page refcount once for group of subpages · Joao Martins <hidden> · 2020-12-08
Re: [PATCH RFC 6/9] mm/gup: Grab head page refcount once for group of subpages · Jason Gunthorpe <jgg@ziepe.ca> · 2020-12-08
Re: [PATCH RFC 6/9] mm/gup: Grab head page refcount once for group of subpages · Joao Martins <hidden> · 2020-12-09
Re: [PATCH RFC 6/9] mm/gup: Grab head page refcount once for group of subpages · Jason Gunthorpe <jgg@ziepe.ca> · 2020-12-09
Re: [PATCH RFC 6/9] mm/gup: Grab head page refcount once for group of subpages · Joao Martins <hidden> · 2020-12-09
Re: [PATCH RFC 6/9] mm/gup: Grab head page refcount once for group of subpages · Jason Gunthorpe <jgg@ziepe.ca> · 2020-12-09
Re: [PATCH RFC 6/9] mm/gup: Grab head page refcount once for group of subpages · Joao Martins <hidden> · 2020-12-09
Re: [PATCH RFC 6/9] mm/gup: Grab head page refcount once for group of subpages · Matthew Wilcox <willy@infradead.org> · 2020-12-09
Re: [PATCH RFC 6/9] mm/gup: Grab head page refcount once for group of subpages · Jason Gunthorpe <jgg@ziepe.ca> · 2020-12-09
Re: [PATCH RFC 6/9] mm/gup: Grab head page refcount once for group of subpages · Joao Martins <hidden> · 2020-12-10
Re: [PATCH RFC 6/9] mm/gup: Grab head page refcount once for group of subpages · John Hubbard <jhubbard@nvidia.com> · 2020-12-09
Re: [PATCH RFC 6/9] mm/gup: Grab head page refcount once for group of subpages · Joao Martins <hidden> · 2020-12-09
[PATCH RFC 2/9] sparse-vmemmap: Consolidate arguments in vmemmap section populate · Joao Martins <hidden> · 2020-12-08
Re: [PATCH RFC 2/9] sparse-vmemmap: Consolidate arguments in vmemmap section populate · John Hubbard <jhubbard@nvidia.com> · 2020-12-09
Re: [PATCH RFC 2/9] sparse-vmemmap: Consolidate arguments in vmemmap section populate · Joao Martins <hidden> · 2020-12-09
Re: [PATCH RFC 2/9] sparse-vmemmap: Consolidate arguments in vmemmap section populate · Dan Williams <hidden> · 2021-02-20
Re: [PATCH RFC 2/9] sparse-vmemmap: Consolidate arguments in vmemmap section populate · Joao Martins <hidden> · 2021-02-22
[PATCH RFC 7/9] mm/gup: Decrement head page once for group of subpages · Joao Martins <hidden> · 2020-12-08
Re: [PATCH RFC 7/9] mm/gup: Decrement head page once for group of subpages · Jason Gunthorpe <jgg@ziepe.ca> · 2020-12-08
Re: [PATCH RFC 7/9] mm/gup: Decrement head page once for group of subpages · John Hubbard <jhubbard@nvidia.com> · 2020-12-09
Re: [PATCH RFC 7/9] mm/gup: Decrement head page once for group of subpages · Jason Gunthorpe <jgg@ziepe.ca> · 2020-12-09
Re: [PATCH RFC 7/9] mm/gup: Decrement head page once for group of subpages · Joao Martins <hidden> · 2020-12-09
Re: [PATCH RFC 7/9] mm/gup: Decrement head page once for group of subpages · Joao Martins <hidden> · 2020-12-17
Re: [PATCH RFC 7/9] mm/gup: Decrement head page once for group of subpages · Jason Gunthorpe <jgg@ziepe.ca> · 2020-12-17
Re: [PATCH RFC 7/9] mm/gup: Decrement head page once for group of subpages · Joao Martins <hidden> · 2020-12-17
Re: [PATCH RFC 7/9] mm/gup: Decrement head page once for group of subpages · Jason Gunthorpe <jgg@ziepe.ca> · 2020-12-18
Re: [PATCH RFC 7/9] mm/gup: Decrement head page once for group of subpages · John Hubbard <jhubbard@nvidia.com> · 2020-12-19
Re: [PATCH RFC 7/9] mm/gup: Decrement head page once for group of subpages · Joao Martins <hidden> · 2020-12-19
[PATCH RFC 8/9] RDMA/umem: batch page unpin in __ib_mem_release() · Joao Martins <hidden> · 2020-12-08
Re: [PATCH RFC 8/9] RDMA/umem: batch page unpin in __ib_mem_release() · Jason Gunthorpe <jgg@ziepe.ca> · 2020-12-08
Re: [PATCH RFC 8/9] RDMA/umem: batch page unpin in __ib_mem_release() · Joao Martins <hidden> · 2020-12-09
Re: [PATCH RFC 8/9] RDMA/umem: batch page unpin in __ib_mem_release() · Joao Martins <hidden> · 2020-12-19
Re: [PATCH RFC 8/9] RDMA/umem: batch page unpin in __ib_mem_release() · John Hubbard <jhubbard@nvidia.com> · 2020-12-09
[PATCH RFC 9/9] mm: Add follow_devmap_page() for devdax vmas · Joao Martins <hidden> · 2020-12-08
Re: [PATCH RFC 9/9] mm: Add follow_devmap_page() for devdax vmas · Jason Gunthorpe <jgg@ziepe.ca> · 2020-12-08
Re: [PATCH RFC 9/9] mm: Add follow_devmap_page() for devdax vmas · Christoph Hellwig <hch@infradead.org> · 2020-12-09
Re: [PATCH RFC 9/9] mm: Add follow_devmap_page() for devdax vmas · Joao Martins <hidden> · 2020-12-09
Re: [PATCH RFC 9/9] mm: Add follow_devmap_page() for devdax vmas · John Hubbard <jhubbard@nvidia.com> · 2020-12-09
[PATCH RFC 1/9] memremap: add ZONE_DEVICE support for compound pages · Joao Martins <hidden> · 2020-12-08
Re: [PATCH RFC 1/9] memremap: add ZONE_DEVICE support for compound pages · John Hubbard <jhubbard@nvidia.com> · 2020-12-09
Re: [PATCH RFC 1/9] memremap: add ZONE_DEVICE support for compound pages · Matthew Wilcox <willy@infradead.org> · 2020-12-09
Re: [PATCH RFC 1/9] memremap: add ZONE_DEVICE support for compound pages · Joao Martins <hidden> · 2020-12-09
Re: [PATCH RFC 1/9] memremap: add ZONE_DEVICE support for compound pages · Dan Williams <hidden> · 2021-02-20
Re: [PATCH RFC 1/9] memremap: add ZONE_DEVICE support for compound pages · Joao Martins <hidden> · 2021-02-22
Re: [PATCH RFC 1/9] memremap: add ZONE_DEVICE support for compound pages · Dan Williams <hidden> · 2021-02-22
Re: [PATCH RFC 1/9] memremap: add ZONE_DEVICE support for compound pages · Joao Martins <hidden> · 2021-02-23
Re: [PATCH RFC 1/9] memremap: add ZONE_DEVICE support for compound pages · Dan Williams <hidden> · 2021-02-23
Re: [PATCH RFC 1/9] memremap: add ZONE_DEVICE support for compound pages · Joao Martins <hidden> · 2021-02-23
Re: [PATCH RFC 1/9] memremap: add ZONE_DEVICE support for compound pages · Dan Williams <hidden> · 2021-02-23
Re: [PATCH RFC 1/9] memremap: add ZONE_DEVICE support for compound pages · Joao Martins <hidden> · 2021-03-10
Re: [PATCH RFC 1/9] memremap: add ZONE_DEVICE support for compound pages · Dan Williams <hidden> · 2021-03-12
Re: [PATCH RFC 1/9] memremap: add ZONE_DEVICE support for compound pages · Dan Williams <hidden> · 2021-02-20
Re: [PATCH RFC 1/9] memremap: add ZONE_DEVICE support for compound pages · Joao Martins <hidden> · 2021-02-22
[PATCH RFC 3/9] sparse-vmemmap: Reuse vmemmap areas for a given page size · Joao Martins <hidden> · 2020-12-08
Re: [PATCH RFC 3/9] sparse-vmemmap: Reuse vmemmap areas for a given page size · Dan Williams <hidden> · 2021-02-20
Re: [PATCH RFC 3/9] sparse-vmemmap: Reuse vmemmap areas for a given page size · Joao Martins <hidden> · 2021-02-22
Re: [PATCH RFC 3/9] sparse-vmemmap: Reuse vmemmap areas for a given page size · Dan Williams <hidden> · 2021-02-22
Re: [PATCH RFC 3/9] sparse-vmemmap: Reuse vmemmap areas for a given page size · Joao Martins <hidden> · 2021-02-23
Re: [PATCH RFC 0/9] mm, sparse-vmemmap: Introduce compound pagemaps · David Hildenbrand <hidden> · 2020-12-09
Re: [External] [PATCH RFC 0/9] mm, sparse-vmemmap: Introduce compound pagemaps · Muchun Song <hidden> · 2020-12-09
Re: [PATCH RFC 0/9] mm, sparse-vmemmap: Introduce compound pagemaps · Dan Williams <hidden> · 2021-02-20
Re: [PATCH RFC 0/9] mm, sparse-vmemmap: Introduce compound pagemaps · Joao Martins <hidden> · 2021-02-22
Re: [PATCH RFC 0/9] mm, sparse-vmemmap: Introduce compound pagemaps · Joao Martins <hidden> · 2021-02-22
Re: [PATCH RFC 0/9] mm, sparse-vmemmap: Introduce compound pagemaps · Joao Martins <hidden> · 2021-02-23
Re: [PATCH RFC 0/9] mm, sparse-vmemmap: Introduce compound pagemaps · Dan Williams <hidden> · 2021-02-23
Re: [PATCH RFC 0/9] mm, sparse-vmemmap: Introduce compound pagemaps · Joao Martins <hidden> · 2021-02-23
Re: [PATCH RFC 0/9] mm, sparse-vmemmap: Introduce compound pagemaps · Dan Williams <hidden> · 2021-02-23
Re: [PATCH RFC 0/9] mm, sparse-vmemmap: Introduce compound pagemaps · Jason Gunthorpe <jgg@ziepe.ca> · 2021-02-23
Re: [PATCH RFC 0/9] mm, sparse-vmemmap: Introduce compound pagemaps · Dan Williams <hidden> · 2021-02-23
Re: [PATCH RFC 0/9] mm, sparse-vmemmap: Introduce compound pagemaps · Jason Gunthorpe <jgg@ziepe.ca> · 2021-02-23
Re: [PATCH RFC 0/9] mm, sparse-vmemmap: Introduce compound pagemaps · Dan Williams <hidden> · 2021-02-24
Re: [PATCH RFC 0/9] mm, sparse-vmemmap: Introduce compound pagemaps · Jason Gunthorpe <jgg@ziepe.ca> · 2021-02-24
Re: [PATCH RFC 0/9] mm, sparse-vmemmap: Introduce compound pagemaps · Dan Williams <hidden> · 2021-02-24

From: Joao Martins <hidden>
Date: 2021-02-22 11:07:26
Also in: nvdimm

On 2/20/21 1:18 AM, Dan Williams wrote:

On Tue, Dec 8, 2020 at 9:32 AM Joao Martins [off-list ref] wrote:

quoted

The link above describes it quite nicely, but the idea is to reuse tail
page vmemmap areas, particular the area which only describes tail pages.
So a vmemmap page describes 64 struct pages, and the first page for a given
ZONE_DEVICE vmemmap would contain the head page and 63 tail pages. The second
vmemmap page would contain only tail pages, and that's what gets reused across
the rest of the subsection/section. The bigger the page size, the bigger the
savings (2M hpage -> save 6 vmemmap pages; 1G hpage -> save 4094 vmemmap pages).

In terms of savings, per 1Tb of memory, the struct page cost would go down
with compound pagemap:

* with 2M pages we lose 4G instead of 16G (0.39% instead of 1.5% of total memory)
* with 1G pages we lose 8MB instead of 16G (0.0007% instead of 1.5% of total memory)

Nice!

I failed to mention this in the cover letter but I should say that with this trick we will
need to build the vmemmap page tables with basepages for 2M align, as opposed to hugepages
in the vmemmap page tables (as you probably could tell from the patches). This means that
we have to allocate a PMD page, and that costs 2GB per 1Tb (as opposed to 4M). This is
fixable for 1G align by reusing PMD pages (albeit I haven't done that in this RFC series).

The footprint reduction is still big, so to iterate the numbers above (and I will fix this
in the v2 cover letter):

* with 2M pages we lose 4G instead of 16G (0.39% instead of 1.5% of total memory)
* with 1G pages we lose 8MB instead of 16G (0.0007% instead of 1.5% of total memory)

For vmemmap page tables, we need to use base pages for 2M pages. So taking that into
account, in this RFC series:

* with 2M pages we lose 6G instead of 16G (0.586% instead of 1.5% of total memory)
* with 1G pages we lose ~2GB instead of 16G (0.19% instead of 1.5% of total memory)

For 1G align, we are able to reuse vmemmap PMDs that only point to tail pages, so
ultimately we can get the page table overhead from 2GB to 12MB:

* with 1G pages we lose 20MB instead of 16G (0.0019% instead of 1.5% of total memory)

quoted

The RDMA patch (patch 8/9) is to demonstrate the improvement for an existing
user. For unpin_user_pages() we have an additional test to demonstrate the
improvement.  The test performs MR reg/unreg continuously and measuring its
rate for a given period. So essentially ib_mem_get and ib_mem_release being
stress tested which at the end of day means: pin_user_pages_longterm() and
unpin_user_pages() for a scatterlist:

    Before:
    159 rounds in 5.027 sec: 31617.923 usec / round (device-dax)
    466 rounds in 5.009 sec: 10748.456 usec / round (hugetlbfs)

    After:
    305 rounds in 5.010 sec: 16426.047 usec / round (device-dax)
    1073 rounds in 5.004 sec: 4663.622 usec / round (hugetlbfs)

Why does hugetlbfs get faster for a ZONE_DEVICE change? Might answer
that question myself when I get to patch 8.

Because the unpinning improvements aren't ZONE_DEVICE specific.

FWIW, I moved those two offending patches outside of this series:

  https://lore.kernel.org/linux-mm/20210212130843.13865-1-joao.m.martins@oracle.com/ (local)

quoted

Patch 9: Improves {pin,get}_user_pages() and its longterm counterpart. It
is very experimental, and I imported most of follow_hugetlb_page(), except
that we do the same trick as gup-fast. In doing the patch I feel this batching
should live in follow_page_mask() and having that being changed to return a set
of pages/something-else when walking over PMD/PUDs for THP / devmap pages. This
patch then brings the previous test of mr reg/unreg (above) on parity
between device-dax and hugetlbfs.

Some of the patches are a little fresh/WIP (specially patch 3 and 9) and we are
still running tests. Hence the RFC, asking for comments and general direction
of the work before continuing.

Will go look at the code, but I don't see anything scary conceptually
here. The fact that pfn_to_page() does not need to change is among the
most compelling features of this approach.

Glad to hear that :D

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help