Re: [PATCH v7 06/12] khugepaged: introduce khugepaged_scan_bitmap for mTHP support
From: Baolin Wang <baolin.wang@linux.alibaba.com>
Date: 2025-05-28 14:04:30
Also in:
linux-doc, linux-mm, lkml
On 2025/5/28 17:26, David Hildenbrand wrote:
On 22.05.25 11:39, Baolin Wang wrote:quoted
On 2025/5/21 18:23, Nico Pache wrote:quoted
On Tue, May 20, 2025 at 4:09 AM Baolin Wang [off-list ref] wrote:quoted
Sorry for late reply. On 2025/5/17 14:47, Nico Pache wrote:quoted
On Thu, May 15, 2025 at 9:20 PM Baolin Wang [off-list ref] wrote:quoted
On 2025/5/15 11:22, Nico Pache wrote:quoted
khugepaged scans anons PMD ranges for potential collapse to a hugepage. To add mTHP support we use this scan to instead record chunks of utilized sections of the PMD. khugepaged_scan_bitmap uses a stack struct to recursively scan a bitmap that represents chunks of utilized regions. We can then determine what mTHP size fits best and in the following patch, we set this bitmap while scanning the anon PMD. A minimum collapse order of 2 is used as this is the lowest order supported by anon memory. max_ptes_none is used as a scale to determine how "full" an order must be before being considered for collapse. When attempting to collapse an order that has its order set to "always" lets always collapse to that order in a greedy manner without considering the number of bits set. Signed-off-by: Nico Pache <npache@redhat.com>Sigh. You still haven't addressed or explained the issues I previously raised [1], so I don't know how to review this patch again...Can you still reproduce this issue?Yes, I can still reproduce this issue with today's (5/20) mm-new branch. I've disabled PMD-sized THP in my system: [root]# cat /sys/kernel/mm/transparent_hugepage/enabled always madvise [never] [root]# cat /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/enabled always inherit madvise [never] And I tried calling madvise() with MADV_COLLAPSE for anonymous memory, and I can still see it collapsing to a PMD-sized THP.Hi Baolin ! Thank you for your reply and willingness to test again :) I didn't realize we were talking about madvise collapse-- this makes sense now. I also figured out why I could "reproduce" it before. My script was always enabling the THP settings in two places, and I only commented out one to test this. But this time I was doing more manual testing. The original design of madvise_collapse ignores the sysfs and collapses even if you have an order disabled. I believe this behavior is wrong, but by design. I spent some time playing around with madvise collapses with and w/o my changes. This is not a new thing, I reproduced the issue in 6.11 (Fedora 41), and I think its been possible since the inception of madvise collapse 3 years ago. I noticed a similar behavior on one of my RFC since it was "breaking" selftests, and the fix was to reincorporate this broken sysfs behavior.OK. Thanks for the explanation.quoted
7d8faaf15545 ("mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse") "This call is independent of the system-wide THP sysfs settings, but will fail for memory marked VM_NOHUGEPAGE." The second condition holds true (and fails for VM_NOHUGEPAGE), but I dont know if we actually want madvise_collapse to be independent of the system-wide.This design principle surprised me a bit, and I failed to find the reason in the commit log. I agree that "never should mean never," and we should respect the THP/mTHP sysfs setting. Additionally, for the 'shmem_enabled' sysfs interface controlled for shmem/tmpfs, THP collapse can still be prohibited through the 'deny' configuration. The rules here are somewhat confusing.I recall that we decided to overwrite "VM_NOHUGEPAGE", because the assumption is that the same app that triggered MADV_NOHUGEPAGE triggers the collapse. So the app decides on its own behavior. Similarly, allowing for collapsing in a VM without VM_HUGEPAGE in the "madvise" mode would be fine. But in the "never" case, we should just "never" collapse.
OK. Let's fix the "never" case first. Thanks.