Thread (50 messages) 50 messages, 9 authors, 2021-08-09

Re: [RFC PATCH 00/15] Make MAX_ORDER adjustable as a kernel boot time parameter.

From: Matthew Wilcox <willy@infradead.org>
Date: 2021-08-07 01:11:33
Also in: lkml

On Fri, Aug 06, 2021 at 01:27:27PM -0700, Hugh Dickins wrote:
On Fri, 6 Aug 2021, Zi Yan wrote:
quoted
In addition, I would like to share more detail on my plan on supporting 1GB PUD THP.
This patchset is the first step, enabling kernel to allocate 1GB pages, so that
user can get 1GB THPs from ZONE_NORMAL and ZONE_MOVABLE without using
alloc_contig_pages() or CMA allocator. The next step is to improve kernel memory
fragmentation handling for pages up to MAX_ORDER, since currently pageblock size
is still limited by memory section size. As a result, I will explore solutions
like having additional larger pageblocks (up to MAX_ORDER) to counter memory
fragmentation. I will discover what else needs to be solved as I gradually improve
1GB PUD THP support.
Sorry to be blunt, but let me state my opinion: 2MB THPs have given and
continue to give us more than enough trouble.  Complicating the kernel's
mm further, just to allow 1GB THPs, seems a very bad tradeoff to me.  I
understand that it's an appealing personal project; but for the sake of
of all the rest of us, please leave 1GB huge pages to hugetlbfs (until
the day when we are all using 2MB base pages).
I respect your opinion, Hugh.  You, more than most of us, have spent an
inordinate amount of time debugging huge page related issues.  I also
share your misgivings about the potential performance improvements for
1GB pages.  They're too big for all but the most unusual of special cases.
This hasn't been helped by the scarce number of 1GB TLB entries in Intel
CPUs until very recently (and even those are hard to come by today).
I do not think they are of interest for the page cache (as I'm fond of
observing, if you have 7GB/s storage (eg the Samsung 980 Pro), you can
take seven page faults per second).

I am, however, of the opinion that 2MB pages give us so much trouble
because they're so very special.  Few people exercise those code paths and
it's easy to break them without noticing.  This is partly why I want to
do arbitrary-order pages.  If everybody is running with compound pages
all the time, we'll see the corner cases often, and people other than
Hugh, Kirill and Mike will be able to work on them.

Now, I'm not planning on working on arbitrary-order anonymous
pages myself.  I think I have enough to deal with in the page cache &
filesystems.  But I'm happy to help out when I can be useful.  I think
256kB pages are probably optimal at the moment for file-backed memory,
so I'm not planning on exploring the space above PMD_ORDER myself.
But there have already been some important areas of collaboration between
the 1GB effort and the folio effort.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help