Thread (16 messages) 16 messages, 3 authors, 2012-11-23

Re: zram OOM behavior

From: Minchan Kim <minchan@kernel.org>
Date: 2012-11-12 13:32:28

Sorry for the late reply.
I'm still going on training course until this week so my response would be delayed, too.

On Fri, Nov 09, 2012 at 09:50:24AM +0000, Mel Gorman wrote:
On Tue, Nov 06, 2012 at 07:17:20PM +0900, Minchan Kim wrote:
quoted
On Tue, Nov 06, 2012 at 08:58:22AM +0000, Mel Gorman wrote:
quoted
On Tue, Nov 06, 2012 at 09:25:50AM +0900, Minchan Kim wrote:
quoted
On Mon, Nov 05, 2012 at 02:46:14PM +0000, Mel Gorman wrote:
quoted
On Sat, Nov 03, 2012 at 07:36:31AM +0900, Minchan Kim wrote:
quoted
quoted
<SNIP>
In the first version it would never try to enter direct reclaim if a
fatal signal was pending but always claim that forward progress was
being made.
Surely we need fix for preventing deadlock with OOM kill and that's why
I have Cced you and this patch fixes it but my question is why we need 
such fatal signal checking trick.

How about this?
Both will work as expected but....
quoted
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 10090c8..881619e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2306,13 +2306,6 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 
        throttle_direct_reclaim(gfp_mask, zonelist, nodemask);
 
-       /*
-        * Do not enter reclaim if fatal signal is pending. 1 is returned so
-        * that the page allocator does not consider triggering OOM
-        */
-       if (fatal_signal_pending(current))
-               return 1;
-
        trace_mm_vmscan_direct_reclaim_begin(order,
                                sc.may_writepage,
                                gfp_mask);
 
In this case, after throttling, current will try to do direct reclaim and
if he makes forward progress, he will get a memory and exit if he receive KILL signal.
It may be completely unnecessary to reclaim memory if the process that was
throttled and killed just exits quickly. As the fatal signal is pending
it will be able to use the pfmemalloc reserves.
quoted
If he can't make forward progress with direct reclaim, he can ends up OOM path but
out_of_memory checks signal check of current and allow to access reserved memory pool
for quick exit and return without killing other victim selection.
While this is true, what advantage is there to having a killed process
potentially reclaiming memory it does not need to?
Killed process needs a memory for him to be terminated. I think it's not a good idea for him
to use reserved memory pool unconditionally although he is throtlled and killed.
Because reserved memory pool is very stricted resource for emergency so using reserved memory
pool should be last resort after he fail to reclaim.
Part of that reclaim can be the process reclaiming its own pages and
putting them in swap just so it can exit shortly afterwards. If it was
throttled in this path, it implies that swap-over-NFS is enabled where
Could we make sure it's only the case for swap-over-NFS?
The PFMEMALLOC reserves being consumed to the point of throttline is only
expected in the case of swap-over-network -- check the pgscan_direct_throttle
counter to be sure. So it's already the case that this throttling logic and
its signal handling is mostly a swap-over-NFS thing. It is possible that
a badly behaving driver using GFP_ATOMIC to allocate long-lived buffers
could force a situation where a process gets throttled but I'm not aware
of a case where this happens todays.
I saw some custom drviers in embedded side have used GFP_ATOMIC easily to protect
avoiding deadlock. Of course, it's not a good behavior but it lives with us.
Even, we can't fix it because we don't have any source. :(
quoted
I think it can happen if the system has very slow thumb card.
How? They shouldn't be stuck in throttling in this case. They should be
blocked on IO, congestion wait, dirty throttling etc.
Some block driver(ex, mmc) uses a thread model with PF_MEMALLOC so I think
they can be stucked by the throttling logic.
quoted
quoted
such reclaim in fact might require the pfmemalloc reserves to be used to
allocate network buffers. It's potentially unnecessary work because the
You mean we need pfmemalloc reserve to swap out anon pages by swap-over-NFS?
In very low-memory situations - yes. We can be at the min watermark but
still need to allocate a page for a network buffer to swap out the anon page.
quoted
Yes. In this case, you're right. I would be better to use reserve pool for
just exiting instead of swap out over network. But how can you make sure that
we have only anonymous page when we try to reclaim? 
If there are some file-backed pages, we can avoid swapout at that time.
Maybe we need some check.
That would be a fairly invasive set of checks for a corner case. if
swap-over-nfs + critically low + about to OOM + file pages available then
only reclaim files.

It's getting off track as to why we're having this discussion in the first
place -- looping due to improper handling of fatal signal pending.
If some user tune /proc/sys/vm/swappiness, we could have many page cache pages
when swap-over-NFS happens.
My point is that why do we should use emergency memory pool although we have
reclaimalble memory?
quoted
quoted
same reserves could have been used to just exit the process.

I'll go your way if you insist because it's not like getting throttled
and killed before exit is a common situation and it should work either
way.
I don't want to insist on. Just want to know what's the problem and find
better solution. :) 
In that case, I'm going to send the patch to Andrew on Monday and avoid
direct reclaim when a fatal signal is pending in the swap-over-network
case. Are you ok with that?
Sorry but I don't think your patch is best approach.
-- 
Mel Gorman
SUSE Labs
-- 
Kind Regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help