Thread (57 messages) 57 messages, 7 authors, 2015-07-31

Re: [PATCH -mm v9 0/8] idle memory tracking

From: Vladimir Davydov <hidden>
Date: 2015-07-29 15:36:58
Also in: linux-api, linux-mm, lkml

On Wed, Jul 29, 2015 at 05:08:55PM +0200, Michal Hocko wrote:
On Wed 29-07-15 17:45:39, Vladimir Davydov wrote:
quoted
On Wed, Jul 29, 2015 at 07:12:13AM -0700, Michel Lespinasse wrote:
quoted
On Wed, Jul 29, 2015 at 6:59 AM, Vladimir Davydov [off-list ref]
wrote:
quoted
quoted
I guess the primary reason to rely on the pfn rather than the LRU walk,
which would be more targeted (especially for memcg cases), is that we
cannot hold lru lock for the whole LRU walk and we cannot continue
walking after the lock is dropped. Maybe we can try to address that
instead? I do not think this is easy to achieve but have you considered
that as an option?
Yes, I have, and I've come to a conclusion it's not doable, because LRU
lists can be constantly rotating at an arbitrary rate. If you have an
idea in mind how this could be done, please share.

Speaking of LRU-vs-PFN walk, iterating over PFNs has its own advantages:
 - You can distribute a walk in time to avoid CPU bursts.
 - You are free to parallelize the scanner as you wish to decrease the
   scan time.
There is a third way: one could go through every MM in the system and scan
their page tables. Doing things that way turns out to be generally faster
than scanning by physical address, because you don't have to go through
RMAP for every page. But, you end up needing to take the mmap_sem lock of
every MM (in turn) while scanning them, and that degrades quickly under
memory load, which is exactly when you most need this feature. So, scan by
address is still what we use here.
Page table scan approach has the inherent problem - it ignores unmapped
page cache. If a workload does a lot of read/write or map-access-unmap
operations, we won't be able to even roughly estimate its wss.
That page cache is trivially reclaimable if it is clean. If it needs
writeback then it is non-idle only until the next writeback. So why does
it matter for the estimation?
Because it might be a part of a workload's working set, in which case
evicting it will make the workload lag.

Thanks,
Vladimir
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help