Thread (50 messages) 50 messages, 8 authors, 2020-06-18

Re: mm: mkfs.ext4 invoked oom-killer on i386 - pagecache_get_page

From: Michal Hocko <mhocko@kernel.org>
Date: 2020-06-17 14:00:01
Also in: cgroups, linux-block, linux-ext4, linux-f2fs-devel, linux-mm, lkml

On Wed 17-06-20 19:07:20, Naresh Kamboju wrote:
On Thu, 21 May 2020 at 22:04, Michal Hocko [off-list ref] wrote:
quoted
On Thu 21-05-20 11:55:16, Michal Hocko wrote:
quoted
On Wed 20-05-20 20:09:06, Chris Down wrote:
quoted
Hi Naresh,

Naresh Kamboju writes:
quoted
As a part of investigation on this issue LKFT teammate Anders Roxell
git bisected the problem and found bad commit(s) which caused this problem.

The following two patches have been reverted on next-20200519 and retested the
reproducible steps and confirmed the test case mkfs -t ext4 got PASS.
( invoked oom-killer is gone now)

Revert "mm, memcg: avoid stale protection values when cgroup is above
protection"
   This reverts commit 23a53e1c02006120f89383270d46cbd040a70bc6.

Revert "mm, memcg: decouple e{low,min} state mutations from protection
checks"
   This reverts commit 7b88906ab7399b58bb088c28befe50bcce076d82.
Thanks Anders and Naresh for tracking this down and reverting.

I'll take a look tomorrow. I don't see anything immediately obviously wrong
in either of those commits from a (very) cursory glance, but they should
only be taking effect if protections are set.
Agreed. If memory.{low,min} is not used then the patch should be
effectively a nop.
I was staring into the code and do not see anything.  Could you give the
following debugging patch a try and see whether it triggers?
diff --git a/mm/vmscan.c b/mm/vmscan.c
index cc555903a332..df2e8df0eb71 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2404,6 +2404,8 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
                         * sc->priority further than desirable.
                         */
                        scan = max(scan, SWAP_CLUSTER_MAX);
+
+                       trace_printk("scan:%lu protection:%lu\n", scan, protection);
                } else {
                        scan = lruvec_size;
                }
@@ -2648,6 +2650,7 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
                mem_cgroup_calculate_protection(target_memcg, memcg);

                if (mem_cgroup_below_min(memcg)) {
+                       trace_printk("under min:%lu emin:%lu\n", memcg->memory.min, memcg->memory.emin);
                        /*
                         * Hard protection.
                         * If there is no reclaimable memory, OOM.
@@ -2660,6 +2663,7 @@ static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc)
                         * there is an unprotected supply
                         * of reclaimable memory from other cgroups.
                         */
+                       trace_printk("under low:%lu elow:%lu\n", memcg->memory.low, memcg->memory.elow);
                        if (!sc->memcg_low_reclaim) {
                                sc->memcg_low_skipped = 1;
                                continue;
As per your suggestions on debugging this problem,
trace_printk is replaced with printk and applied to your patch on top of the
problematic kernel and here is the test output and link.

mkfs -t ext4 /dev/disk/by-id/ata-TOSHIBA_MG04ACA100N_Y8RQK14KF6XF
mke2fs 1.43.8 (1-Jan-2018)
Creating filesystem with 244190646 4k blocks and 61054976 inodes
Filesystem UUID: 7c380766-0ed8-41ba-a0de-3c08e78f1891
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848
Allocating group tables:    0/7453 done
Writing inode tables:    0/7453 done
Creating journal (262144 blocks): [   51.544525] under min:0 emin:0
[   51.845304] under min:0 emin:0
[   51.848738] under min:0 emin:0
[   51.858147] under min:0 emin:0
[   51.861333] under min:0 emin:0
[   51.862034] under min:0 emin:0
[   51.862442] under min:0 emin:0
[   51.862763] under min:0 emin:0

Full test log link,
https://lkft.validation.linaro.org/scheduler/job/1497412#L1451
Thanks a lot. So it is clear that mem_cgroup_below_min got confused and
reported protected cgroup. Both effective and real limits are 0 so there
is no garbage in them. The problem is in mem_cgroup_below_* and it is
quite obvious.

We are doing the following
+static inline bool mem_cgroup_below_min(struct mem_cgroup *memcg)
+{
+       if (mem_cgroup_disabled())
+               return false;
+
+       return READ_ONCE(memcg->memory.emin) >=
+               page_counter_read(&memcg->memory);
+}

and it makes some sense. Except for the root memcg where we do not
account any memory. Adding if (mem_cgroup_is_root(memcg)) return false;
should do the trick. The same is the case for mem_cgroup_below_low.
Could you give it a try please just to confirm?
-- 
Michal Hocko
SUSE Labs
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help