Thread (38 messages) 38 messages, 8 authors, 2015-11-05

Re: [PATCH 5/8] mm: move lazily freed pages to inactive list

From: Minchan Kim <hidden>
Date: 2015-11-05 01:03:24
Also in: linux-mm, lkml

On Wed, Nov 04, 2015 at 09:53:42AM -0800, Shaohua Li wrote:
On Tue, Nov 03, 2015 at 09:52:23AM +0900, Minchan Kim wrote:
quoted
On Fri, Oct 30, 2015 at 10:22:12AM -0700, Shaohua Li wrote:
quoted
On Fri, Oct 30, 2015 at 04:01:41PM +0900, Minchan Kim wrote:
quoted
MADV_FREE is a hint that it's okay to discard pages if there is memory
pressure and we use reclaimers(ie, kswapd and direct reclaim) to free them
so there is no value keeping them in the active anonymous LRU so this
patch moves them to inactive LRU list's head.

This means that MADV_FREE-ed pages which were living on the inactive list
are reclaimed first because they are more likely to be cold rather than
recently active pages.

An arguable issue for the approach would be whether we should put the page
to the head or tail of the inactive list.  I chose head because the kernel
cannot make sure it's really cold or warm for every MADV_FREE usecase but
at least we know it's not *hot*, so landing of inactive head would be a
comprimise for various usecases.

This fixes suboptimal behavior of MADV_FREE when pages living on the
active list will sit there for a long time even under memory pressure
while the inactive list is reclaimed heavily.  This basically breaks the
whole purpose of using MADV_FREE to help the system to free memory which
is might not be used.
My main concern is the policy how we should treat the FREE pages. Moving it to
inactive lru is definitionly a good start, I'm wondering if it's enough. The
MADV_FREE increases memory pressure and cause unnecessary reclaim because of
the lazy memory free. While MADV_FREE is intended to be a better replacement of
MADV_DONTNEED, MADV_DONTNEED doesn't have the memory pressure issue as it free
memory immediately. So I hope the MADV_FREE doesn't have impact on memory
pressure too. I'm thinking of adding an extra lru list and wartermark for this
to make sure FREE pages can be freed before system wide page reclaim. As you
said, this is arguable, but I hope we can discuss about this issue more.
Yes, it's arguble. ;-)

It seems the divergence comes from MADV_FREE is *replacement* of MADV_DONTNEED.
But I don't think so. If we could discard MADV_FREEed page *anytime*, I agree
but it's not true because the page would be dirty state when VM want to reclaim. 
There certainly are other usage cases, but even your patch log mainly describes
the jemalloc usage case, which uses MADV_DONTNEED.
quoted
I'm also against with your's suggestion which let's discard FREEed page before
system wide page reclaim because system would have lots of clean cold page
caches or anonymous pages. In such case, reclaiming of them would be better.
Yeb, it's really workload-dependent so we might need some heuristic which is
normally what we want to avoid.

Having said that, I agree with you we could do better than the deactivation
and frankly speaking, I'm thinking of another LRU list(e.g. tentatively named
"ezreclaim LRU list"). What I have in mind is to age (anon|file|ez)
fairly. IOW, I want to percolate ez-LRU list reclaiming into get_scan_count.
When the MADV_FREE is called, we could move hinted pages from anon-LRU to
ez-LRU and then If VM find to not be able to discard a page in ez-LRU,
it could promote it to acive-anon-LRU which would be very natural aging
concept because it mean someone touches the page recenlty.

With that, I don't want to bias one side and don't want to add some knob for
tuning the heuristic but let's rely on common fair aging scheme of VM.

Another bonus with new LRU list is we could support MADV_FREE on swapless
system.
quoted
Or do you want to push this first and address the policy issue later?
I believe adding new LRU list would be controversial(ie, not trivial)
for maintainer POV even though code wouldn't be complicated.
So, I want to see problems in *real practice*, not any theoritical
test program before diving into that.
To see such voice of request, we should release the syscall.
So, I want to push this first.
The memory pressure issue isn't just in artificial test. In jemalloc, there is
a knob (lg_dirty_mult) to control the rate memory should be purged (using
MADV_DONTNEED). We already had several reports in our production environment
changing the knob can cause extra memory usage (and swap and so on). If
jemalloc uses MADV_FREE, jemalloc will not purge any memory, which is equivent
to disable current MADV_DONTNEED (eg, lg_dirty_mult = -1). I'm sure this will
cause the similar issue, eg (extram memory usage, swap). That said I don't
object to push this first, but the memory pressue issue can happen in real
production, I hope it's not ignored.
Absolutely, I'm not saying I want to ignore the concern.
Adding new LRU would make churning of many part in MM so before that,
let's see the voice from userland and discuss what's the best if it
has trouble.
Thanks,
Shaohua
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help