Re: [PATCH] mm: Mark idle page tracking as BROKEN
From: David Hildenbrand <hidden>
Date: 2021-06-18 12:48:49
On 16.06.21 21:23, Yu Zhao wrote:
On Wed, Jun 16, 2021 at 2:43 AM David Hildenbrand [off-list ref] wrote:quoted
On 16.06.21 10:36, Vlastimil Babka wrote:quoted
On 6/16/21 8:22 AM, Yu Zhao wrote:quoted
On Tue, Jun 15, 2021 at 8:55 PM Matthew Wilcox [off-list ref] wrote:quoted
I don't know. I asked the others on the call and the answer I got was essentially "Just delete it". I'm kind of hoping the others speak up.I listed a couple of things when acking this patch. Being broken is not a problem as long as there are users who care about it. What made me think such users may not exist is that nobody ever complained about those things until we stumbled on them -- I'm not insisting on deleting this feature, just clarifying why I thought so.Similar feelings here. On the call it looked like the feature was abandoned by its creators, and it wasn't clear if the distros that had it enabled did so due to reasons that still apply for future versions. Sending the proposal and getting a feedback that there are users is one of the expected valid outcomes.For us (RH) it will be very interesting to know the exact things that are "suboptimal" (I'm avoiding the terminology "broken" here), so we can actually evaluate if this might affect customers and might be worth "improving".I consider the examples I gave in my first email breakages -- others broke/break the idle page tracking -- and I think it's safe to assume they will continue to happen.
Right, just as with any other feature that has very bad (no?) upstream
test coverage and doesn't immediately blow up if not done 100% right.
So to summarize (thanks for the input!):
1. It was really broken om arm64 before we had 07509e10dcc7 ("arm64:
pgtable: Fix pte_accessible()") but should be working now.
2. Functions that call pte/pmd_mkold() but not test_and_clear_young()
are shaky.
3. MADV_FREE'ed pages won't actually get freed and treated as if they
were reaccessed, because page_referenced() will return true upon seeing
PageYoung().
4. Huge page handling is suboptimal and requires proper care from user
space to get it right:
https://lore.kernel.org/linux-mm/20210614081610.16123-1-sjpark@amazon.de/ (local)
I suspect daemon will have similar interest in optimizing 2 and 3, right?
If you are really looking for improvements, the page compaction has always been a good example. For the idle page tracking, with physical memory as little as 4GB, it needs to go thru one million PFNs, no matter how many compound or buddy pages there are. For THPs, it will try to get_page_unless_zero() on tail pages, which always fails. This is why we discussed it in the meeting.
Right, this sounds sub-optimal.
What can't be improved is the memory locality of PFNs. They are not grouped by memcgs or processes. Two PFNs next to each other can be from two processes with two sets of five-level page tables. The cache misses simply outweigh any potential benefits one might get from this feature, speaking as one of the customers.
Right. -- Thanks, David / dhildenb