Thread (55 messages) 55 messages, 8 authors, 2012-11-01

Re: zram OOM behavior

From: Luigi Semenzato <hidden>
Date: 2012-11-01 15:58:14

(Sorry, slip of finger.)

On Thu, Nov 1, 2012 at 8:57 AM, Luigi Semenzato [off-list ref] wrote:
On Thu, Nov 1, 2012 at 1:28 AM, Mel Gorman [off-list ref] wrote:
quoted
On Wed, Oct 31, 2012 at 09:48:57PM -0700, David Rientjes wrote:
quoted
On Thu, 1 Nov 2012, Minchan Kim wrote:
quoted
It's not true any more.
3.6 includes following code in try_to_free_pages

        /*
         * Do not enter reclaim if fatal signal is pending. 1 is returned so
         * that the page allocator does not consider triggering OOM
         */
        if (fatal_signal_pending(current))
                return 1;

So the hunged task never go to the OOM path and could be looping forever.
Ah, interesting.  This is from commit 5515061d22f0 ("mm: throttle direct
reclaimers if PF_MEMALLOC reserves are low and swap is backed by network
storage").  Thanks for adding Mel to the cc.
Indeed, thanks.
quoted
The oom killer specifically has logic for this condition: when calling
out_of_memory() the first thing it does is

      if (fatal_signal_pending(current))
              set_thread_flag(TIF_MEMDIE);

to allow it access to memory reserves so that it may exit if it's having
trouble.  But that ends up never happening because of the above code that
Minchan has identified.

So we either need to do set_thread_flag(TIF_MEMDIE) in try_to_free_pages()
as well or revert that early return entirely; there's no justification
given for it in the comment nor in the commit log.
The check for fatal signal is in the wrong place. The reason it was added
is because a throttled process sleeps in an interruptible sleep.  If a user
user forcibly kills a throttled process, it should not result in an OOM kill.
quoted
I'd rather remove it
and allow the oom killer to trigger and grant access to memory reserves
itself if necessary.

Mel, how does commit 5515061d22f0 deal with threads looping forever if
they need memory in the exit path since the oom killer never gets called?
It doesn't. How about this?

---8<---
mm: vmscan: Check for fatal signals iff the process was throttled

commit 5515061d22f0 ("mm: throttle direct reclaimers if PF_MEMALLOC reserves
are low and swap is backed by network storage") introduced a check for
fatal signals after a process gets throttled for network storage. The
intention was that if a process was throttled and got killed that it
should not trigger the OOM killer. As pointed out by Minchan Kim and
David Rientjes, this check is in the wrong place and too broad. If a
system is in am OOM situation and a process is exiting, it can loop in
__alloc_pages_slowpath() and calling direct reclaim in a loop. As the
fatal signal is pending it returns 1 as if it is making forward progress
and can effectively deadlock.

This patch moves the fatal_signal_pending() check after throttling to
throttle_direct_reclaim() where it belongs.

If this patch passes review it should be considered a -stable candidate
for 3.6.

Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/vmscan.c |   37 +++++++++++++++++++++++++++----------
 1 file changed, 27 insertions(+), 10 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 2b7edfa..ca9e37f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2238,9 +2238,12 @@ static bool pfmemalloc_watermark_ok(pg_data_t *pgdat)
  * Throttle direct reclaimers if backing storage is backed by the network
  * and the PFMEMALLOC reserve for the preferred node is getting dangerously
  * depleted. kswapd will continue to make progress and wake the processes
- * when the low watermark is reached
+ * when the low watermark is reached.
+ *
+ * Returns true if a fatal signal was delivered during throttling. If this
+ * happens, the page allocator should not consider triggering the OOM killer.
  */
-static void throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
+static bool throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
                                        nodemask_t *nodemask)
 {
        struct zone *zone;
@@ -2255,13 +2258,20 @@ static void throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
         * processes to block on log_wait_commit().
         */
        if (current->flags & PF_KTHREAD)
-               return;
+               goto out;
+
+       /*
+        * If a fatal signal is pending, this process should not throttle.
+        * It should return quickly so it can exit and free its memory
+        */
+       if (fatal_signal_pending(current))
+               goto out;

        /* Check if the pfmemalloc reserves are ok */
        first_zones_zonelist(zonelist, high_zoneidx, NULL, &zone);
        pgdat = zone->zone_pgdat;
        if (pfmemalloc_watermark_ok(pgdat))
-               return;
+               goto out;

        /* Account for the throttling */
        count_vm_event(PGSCAN_DIRECT_THROTTLE);
@@ -2277,12 +2287,20 @@ static void throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
        if (!(gfp_mask & __GFP_FS)) {
                wait_event_interruptible_timeout(pgdat->pfmemalloc_wait,
                        pfmemalloc_watermark_ok(pgdat), HZ);
-               return;
+
+               goto check_pending;
        }

        /* Throttle until kswapd wakes the process */
        wait_event_killable(zone->zone_pgdat->pfmemalloc_wait,
                pfmemalloc_watermark_ok(pgdat));
+
+check_pending:
+       if (fatal_signal_pending(current))
+               return true;
+
+out:
+       return false;
 }

 unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
@@ -2304,13 +2322,12 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
                .gfp_mask = sc.gfp_mask,
        };

-       throttle_direct_reclaim(gfp_mask, zonelist, nodemask);
-
        /*
-        * Do not enter reclaim if fatal signal is pending. 1 is returned so
-        * that the page allocator does not consider triggering OOM
+        * Do not enter reclaim if fatal signal was delivered while throttled.
+        * 1 is returned so that the page allocator does not OOM kill at this
+        * point.
         */
-       if (fatal_signal_pending(current))
+       if (throttle_direct_reclaim(gfp_mask, zonelist, nodemask))
                return 1;

        trace_mm_vmscan_direct_reclaim_begin(order,

So which one should I try first, David's change or Mel's?

Does Mel's change take into account the fact that the exiting process
is already deep into do_exit() (exit_mm() to be precise) when it tries
to allocate?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help