Thread (13 messages) 13 messages, 7 authors, 2021-06-18

Re: [PATCH] mm: Mark idle page tracking as BROKEN

From: David Hildenbrand <hidden>
Date: 2021-06-15 07:41:44

On 12.06.21 02:07, Matthew Wilcox (Oracle) wrote:

I might be missing something important, so some questions/comments
In discussion with other MM developers around how idle page tracking
should be fixed for transparent huge pages, several expressed the opinion
that it should be removed as it is inefficient at accomplishing the
job that it is supposed to, and we have better mechanisms (eg uffd) for
accomplishing the same goals these days.
1. A link to that discussion would be nice. I am missing some important 
details in this patch description.

2. "should be fixed for transparent huge pages" -- has it always been 
like this or has the behavior changed at some point? Do the semantics, 
and how the feature is getting used, clearly identify this case that 
needs fixing as something that really has to be fixed? Or was it always 
like that and actually expected to work like that ("semtantics")?

For example, just like for softdirty tracking, an over-indication might 
be just fine. More extreme, I think idle tracking can actually deal with 
an under-indication, if it's actually used for minor performance 
improvements (just like with DAEMON). Again, this should be clarified in 
this patch description.

Because I read "This information can be useful for estimating the 
workload’s working set size, which, in turn, can be taken into account 
when configuring the workload parameters, setting memory cgroup limits, 
or deciding where to place the workload within a compute cluster." -- 
which doesn't sound like it has to be 100% correct in all of the cases.

And "Since the idle memory tracking feature is based on the memory 
reclaimer logic, it only works with pages that are on an LRU list, other 
pages are silently ignored. That means it will ignore a user memory page 
if it is isolated, but since there are usually not many of them, it 
should not affect the overall result noticeably. In order not to stall 
scanning of the idle page bitmap, locked pages may be skipped too", so 
there are already special cases to deal with.

3. I don't see how the "better mechanisms" can actually be used to 
accomplish the same goal. You state "uffd", however, I don't see a way 
to actually achieve the same goal using uffd. In MISSING mode, you would 
have to zap/discard page content to get an access notification. in WP 
mode you really cannot catch reads.
Mark the feature as BROKEN for now and we can remove it entirely in a
few months if nobody complains.  It is not enabled by Android, ChromeOS,
Debian, Fedora or SUSE.  Red Hat enabled it with RHEL-8.1 and UEK followed
suit, but I have been unable to find why RHEL enabled it.
My fellow RH people tell me that we enabled in RHEL7 on customer demand 
and consequently enabled it in RHEL8.

To me it feels like we might be removing a feature that is working just 
as expected because the semantics might not be 100% clear to everybody 
involved. Of course, I might be wrong, that's why I'm asking.

I'd actually vote for documenting what's happening, just like we do for 
locked pages and !LRU pages ... unless there is really something heavily 
broken.

-- 
Thanks,

David / dhildenb

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help