Thread (51 messages) 51 messages, 9 authors, 2011-05-25

Re: Kernel falls apart under light memory pressure (i.e. linking vmlinux)

From: Andrew Lutomirski <hidden>
Date: 2011-05-22 12:22:49
Also in: lkml

Possibly related (same subject, not in this thread)

On Sat, May 21, 2011 at 10:44 AM, Minchan Kim [off-list ref] wrote:
quoted hunk ↗ jump to hunk
Hi Andrew.

On Sat, May 21, 2011 at 10:34 PM, Andrew Lutomirski [off-list ref] wrote:
quoted
On Sat, May 21, 2011 at 8:04 AM, KOSAKI Motohiro
[off-list ref] wrote:
quoted
quoted
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 3f44b81..d1dabc9 100644
@@ -1426,8 +1437,13 @@ shrink_inactive_list(unsigned long nr_to_scan,
struct zone *zone,

       /* Check if we should syncronously wait for writeback */
       if (should_reclaim_stall(nr_taken, nr_reclaimed, priority, sc)) {
+               unsigned long nr_active, old_nr_scanned;
               set_reclaim_mode(priority, sc, true);
+               nr_active = clear_active_flags(&page_list, NULL);
+               count_vm_events(PGDEACTIVATE, nr_active);
+               old_nr_scanned = sc->nr_scanned;
               nr_reclaimed += shrink_page_list(&page_list, zone, sc);
+               sc->nr_scanned = old_nr_scanned;
       }

       local_irq_disable();

I just tested 2.6.38.6 with the attached patch.  It survived dirty_ram
and test_mempressure without any problems other than slowness, but
when I hit ctrl-c to stop test_mempressure, I got the attached oom.
Minchan,

I'm confused now.
If pages got SetPageActive(), should_reclaim_stall() should never return true.
Can you please explain which bad scenario was happen?

-----------------------------------------------------------------------------------------------------
static void reset_reclaim_mode(struct scan_control *sc)
{
       sc->reclaim_mode = RECLAIM_MODE_SINGLE | RECLAIM_MODE_ASYNC;
}

shrink_page_list()
{
 (snip)
 activate_locked:
               SetPageActive(page);
               pgactivate++;
               unlock_page(page);
               reset_reclaim_mode(sc);                  /// here
               list_add(&page->lru, &ret_pages);
       }
-----------------------------------------------------------------------------------------------------


-----------------------------------------------------------------------------------------------------
bool should_reclaim_stall()
{
 (snip)

       /* Only stall on lumpy reclaim */
       if (sc->reclaim_mode & RECLAIM_MODE_SINGLE)   /// and here
               return false;
-----------------------------------------------------------------------------------------------------
I did some tracing and the oops happens from the second call to
shrink_page_list after should_reclaim_stall returns true and it hits
the same pages in the same order that the earlier call just finished
calling SetPageActive on.  I have *not* confirmed that the two calls
happened from the same call to shrink_inactive_list, but something's
certainly wrong in there.

This is very easy to reproduce on my laptop.
I would like to confirm this problem.
Could you show the diff of 2.6.38.6 with current your 2.6.38.6 + alpha?
(ie, I would like to know that what patches you add up on vanilla
2.6.38.6 to reproduce this problem)
I believe you added my crap below patch. Right?
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 292582c..69d317e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -311,7 +311,8 @@ static void set_reclaim_mode(int priority, struct
scan_control *sc,
       */
      if (sc->order > PAGE_ALLOC_COSTLY_ORDER)
              sc->reclaim_mode |= syncmode;
-       else if (sc->order && priority < DEF_PRIORITY - 2)
+       else if ((sc->order && priority < DEF_PRIORITY - 2) ||
+                               prioiry <= DEF_PRIORITY / 3)
              sc->reclaim_mode |= syncmode;
      else
              sc->reclaim_mode = RECLAIM_MODE_SINGLE | RECLAIM_MODE_ASYNC;
@@ -1349,10 +1350,6 @@ static inline bool
should_reclaim_stall(unsigned long nr_taken,
      if (current_is_kswapd())
              return false;

-       /* Only stall on lumpy reclaim */
-       if (sc->reclaim_mode & RECLAIM_MODE_SINGLE)
-               return false;
-
Bah.  It's this last hunk.  Without this I can't reproduce the oops.
With this hunk, the reset_reclaim_mode doesn't work and
shrink_page_list is incorrectly called twice.

So we're back to the original problem...

--Andy

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help