Thread (22 messages) 22 messages, 4 authors, 2020-02-14

Re: [PATCH 2/3] mm: vmscan: detect file thrashing at the reclaim root

From: Shakeel Butt <hidden>
Date: 2019-11-15 16:52:39
Also in: linux-mm, lkml

On Fri, Nov 15, 2019 at 8:07 AM Johannes Weiner [off-list ref] wrote:
On Thu, Nov 14, 2019 at 03:47:59PM -0800, Shakeel Butt wrote:
quoted
On Thu, Nov 7, 2019 at 12:53 PM Johannes Weiner [off-list ref] wrote:
quoted
We use refault information to determine whether the cache workingset
is stable or transitioning, and dynamically adjust the inactive:active
file LRU ratio so as to maximize protection from one-off cache during
stable periods, and minimize IO during transitions.

With cgroups and their nested LRU lists, we currently don't do this
correctly. While recursive cgroup reclaim establishes a relative LRU
order among the pages of all involved cgroups, refaults only affect
the local LRU order in the cgroup in which they are occuring. As a
result, cache transitions can take longer in a cgrouped system as the
active pages of sibling cgroups aren't challenged when they should be.

[ Right now, this is somewhat theoretical, because the siblings, under
  continued regular reclaim pressure, should eventually run out of
  inactive pages - and since inactive:active *size* balancing is also
  done on a cgroup-local level, we will challenge the active pages
  eventually in most cases. But the next patch will move that relative
  size enforcement to the reclaim root as well, and then this patch
  here will be necessary to propagate refault pressure to siblings. ]

This patch moves refault detection to the root of reclaim. Instead of
remembering the cgroup owner of an evicted page, remember the cgroup
that caused the reclaim to happen. When refaults later occur, they'll
correctly influence the cross-cgroup LRU order that reclaim follows.
Can you please explain how "they'll correctly influence"? I see that
if the refaulted page was evicted due to pressure in some ancestor,
then that's ancestor's refault distance and active file size will be
used to decide to activate the refaulted page but how that is
influencing cross-cgroup LRUs?
I take it the next patch answered your question: Activating a page
inside a cgroup has an effect on how it's reclaimed relative to pages
in sibling cgroups. So the influence part isn't new with this change -
it's about recognizing that an activation has an effect on a wider
scope than just the local cgroup, and considering that scope when
making the decision whether to activate or not.
Thanks for the clarification.
quoted
quoted
@@ -302,6 +330,17 @@ void workingset_refault(struct page *page, void *shadow)
         */
        refault_distance = (refault - eviction) & EVICTION_MASK;

+       /*
+        * The activation decision for this page is made at the level
+        * where the eviction occurred, as that is where the LRU order
+        * during page reclaim is being determined.
+        *
+        * However, the cgroup that will own the page is the one that
+        * is actually experiencing the refault event.
+        */
+       memcg = get_mem_cgroup_from_mm(current->mm);
Why not page_memcg(page)? page is locked.
Nice catch! Indeed, the page is charged and locked at this point, so
we don't have to do another lookup and refcounting dance here.

Delta patch:

---

From 8984f37f3e88b1b39c7d6470b313730093b24474 Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Fri, 15 Nov 2019 09:14:04 -0500
Subject: [PATCH] mm: vmscan: detect file thrashing at the reclaim root fix

Shakeel points out that the page is locked and already charged by the
time we call workingset_refault(). Instead of doing another cgroup
lookup and reference from current->mm we can simply use page_memcg().

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
For the complete patch:

Reviewed-by: Shakeel Butt <redacted>
quoted hunk ↗ jump to hunk
---
 mm/workingset.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/mm/workingset.c b/mm/workingset.c
index f0885d9f41cd..474186b76ced 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -338,7 +338,7 @@ void workingset_refault(struct page *page, void *shadow)
         * However, the cgroup that will own the page is the one that
         * is actually experiencing the refault event.
         */
-       memcg = get_mem_cgroup_from_mm(current->mm);
+       memcg = page_memcg(page);
        lruvec = mem_cgroup_lruvec(memcg, pgdat);

        inc_lruvec_state(lruvec, WORKINGSET_REFAULT);
@@ -349,7 +349,7 @@ void workingset_refault(struct page *page, void *shadow)
         * the memory was available to the page cache.
         */
        if (refault_distance > active_file)
-               goto out_memcg;
+               goto out;

        SetPageActive(page);
        advance_inactive_age(memcg, pgdat);
@@ -360,9 +360,6 @@ void workingset_refault(struct page *page, void *shadow)
                SetPageWorkingset(page);
                inc_lruvec_state(lruvec, WORKINGSET_RESTORE);
        }
-
-out_memcg:
-       mem_cgroup_put(memcg);
 out:
        rcu_read_unlock();
 }
--
2.24.0
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help