Re: [PATCH v7 06/12] khugepaged: introduce khugepaged_scan_bitmap for mTHP support
From: Nico Pache <npache@redhat.com>
Date: 2025-05-21 10:23:38
Also in:
linux-doc, linux-mm, lkml
On Tue, May 20, 2025 at 4:09 AM Baolin Wang [off-list ref] wrote:
Sorry for late reply. On 2025/5/17 14:47, Nico Pache wrote:quoted
On Thu, May 15, 2025 at 9:20 PM Baolin Wang [off-list ref] wrote:quoted
On 2025/5/15 11:22, Nico Pache wrote:quoted
khugepaged scans anons PMD ranges for potential collapse to a hugepage. To add mTHP support we use this scan to instead record chunks of utilized sections of the PMD. khugepaged_scan_bitmap uses a stack struct to recursively scan a bitmap that represents chunks of utilized regions. We can then determine what mTHP size fits best and in the following patch, we set this bitmap while scanning the anon PMD. A minimum collapse order of 2 is used as this is the lowest order supported by anon memory. max_ptes_none is used as a scale to determine how "full" an order must be before being considered for collapse. When attempting to collapse an order that has its order set to "always" lets always collapse to that order in a greedy manner without considering the number of bits set. Signed-off-by: Nico Pache <npache@redhat.com>Sigh. You still haven't addressed or explained the issues I previously raised [1], so I don't know how to review this patch again...Can you still reproduce this issue?Yes, I can still reproduce this issue with today's (5/20) mm-new branch. I've disabled PMD-sized THP in my system: [root]# cat /sys/kernel/mm/transparent_hugepage/enabled always madvise [never] [root]# cat /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled always inherit madvise [never] And I tried calling madvise() with MADV_COLLAPSE for anonymous memory, and I can still see it collapsing to a PMD-sized THP.
Hi Baolin ! Thank you for your reply and willingness to test again :)
I didn't realize we were talking about madvise collapse-- this makes
sense now. I also figured out why I could "reproduce" it before. My
script was always enabling the THP settings in two places, and I only
commented out one to test this. But this time I was doing more manual
testing.
The original design of madvise_collapse ignores the sysfs and
collapses even if you have an order disabled. I believe this behavior
is wrong, but by design. I spent some time playing around with madvise
collapses with and w/o my changes. This is not a new thing, I
reproduced the issue in 6.11 (Fedora 41), and I think its been
possible since the inception of madvise collapse 3 years ago. I
noticed a similar behavior on one of my RFC since it was "breaking"
selftests, and the fix was to reincorporate this broken sysfs
behavior.
7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse")
"This call is independent of the system-wide THP sysfs settings, but
will fail for memory marked VM_NOHUGEPAGE."
The second condition holds true (and fails for VM_NOHUGEPAGE), but I
dont know if we actually want madvise_collapse to be independent of
the system-wide.
So I'll ask the authors
+David Rientjes +zokeefe@google.com
Was this brought up as a concern when this feature was first
introduced, was there any pushback, what was the outcome of the
discussion if so?
I can easily fix this and it would further simplify the code (by
removing the is_khugepaged and friends). As David H. has brought up in
other discussions around similar topics, never should mean never, is
this the only exception we should allow?
Thanks!quoted
I can no longer reproduce this issue, that's why I posted... although I should have followed up, and looked into what the original issue was. Nothing really sticks out so perhaps something in mm-new was broken and pulled out... not sure. It should now follow the expected behavior, which is that no mTHP collapse occurs because if the PMD size is disabled so is khugepaged collapse. Lmk if you are still experiencing this issue please. Cheers, -- Nico