Thread (14 messages) 14 messages, 3 authors, 2012-06-25

Re: [PATCH V5 5/5] mm: memcg discount pages under softlimit from per-zone reclaimable_pages

From: Ying Han <hidden>
Date: 2012-06-25 21:00:54

On Tue, Jun 19, 2012 at 5:05 AM, Johannes Weiner [off-list ref] wrote:
On Mon, Jun 18, 2012 at 09:47:31AM -0700, Ying Han wrote:
quoted
The function zone_reclaimable() marks zone->all_unreclaimable based on
per-zone pages_scanned and reclaimable_pages. If all_unreclaimable is true,
alloc_pages could go to OOM instead of getting stuck in page reclaim.
There is no zone->all_unreclaimable at this point, you removed it in
the previous patch.
quoted
In memcg kernel, cgroup under its softlimit is not targeted under global
reclaim. So we need to remove those pages from reclaimable_pages, otherwise
it will cause reclaim mechanism to get stuck trying to reclaim from
all_unreclaimable zone.
Can't you check if zone->pages_scanned changed in between reclaim
runs?

Or sum up the scanned and reclaimable pages encountered while
iterating the hierarchy during regular reclaim and then use those
numbers in the equation instead of the per-zone counters?

Walking the full global hierarchy in all the places where we check if
a zone is reclaimable is a scalability nightmare.
One way to solve this is to record the per-zone reclaimable pages (
sum of reclaimable pages of memcg above softlimits ) after each
shrink_zone(). The later function does walk the memcg hierarchy and
also checks the softlimit, so we don't need to do it again. The new
value pages_reclaimed is recorded per-zone, and the caller side could
use that to compare w/ zone->pages_scanned.

While I run tests on the patch, it turns out that I can not reproduce
the problem ( machine hang while over-committing the softlimit) even
w/o the patch. Then I realize that the problem only exist in the
internal version we don't have the check "sc->priority < DEF_PRIORITY
- 2" to bypass softlimit check. The reason we did that part is to
guarantee no global pressure on high priority memcgs.  So In that
case, global reclaim can never steal any pages from any memgs and the
system can easily hang.

This is not the case in the version I am posting here. The patch
guarantees not looping in memcgs all under softlimit by :
1. detects whether no memcg above their softlimit, if so, skip
checking softlimit
2. only check softlimit memcg if priority is >= DEF_PRIORITY - 2

In summary, the problem described in this patch doesn't exist. So I am
thinking to drop this one on my next post. Please comment.

--Ying
quoted
@@ -100,18 +100,36 @@ static __always_inline enum lru_list page_lru(struct page *page)
      return lru;
 }

+static inline unsigned long get_lru_size(struct lruvec *lruvec,
+                                      enum lru_list lru)
+{
+     if (!mem_cgroup_disabled())
+             return mem_cgroup_get_lru_size(lruvec, lru);
+
+     return zone_page_state(lruvec_zone(lruvec), NR_LRU_BASE + lru);
+}
+
 static inline unsigned long zone_reclaimable_pages(struct zone *zone)
 {
-     int nr;
+     int nr = 0;
+     struct mem_cgroup *memcg;
+
+     memcg = mem_cgroup_iter(NULL, NULL, NULL);
+     do {
+             struct lruvec *lruvec = mem_cgroup_zone_lruvec(zone, memcg);

-     nr = zone_page_state(zone, NR_ACTIVE_FILE) +
-          zone_page_state(zone, NR_INACTIVE_FILE);
+             if (should_reclaim_mem_cgroup(memcg)) {
+                     nr += get_lru_size(lruvec, LRU_INACTIVE_FILE) +
+                           get_lru_size(lruvec, LRU_ACTIVE_FILE);
Sometimes, the number of reclaimable pages DO include those of groups
for which should_reclaim_mem_cgroup() is false: when the priority
level is <= DEF_PRIORITY - 2, as you defined in 1/5!  This means that
you consider pages you just scanned unreclaimable, which can result in
the zone being unreclaimable after the DEF_PRIORITY - 2 cycle, no?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help