Thread (48 messages) 48 messages, 7 authors, 2020-06-18

Re: mm: mkfs.ext4 invoked oom-killer on i386 - pagecache_get_page

From: Michal Hocko <mhocko@kernel.org>
Date: 2020-06-17 16:06:35
Also in: cgroups, linux-ext4, linux-f2fs-devel, linux-mm, linux-next, lkml
Subsystem: control group - memory resource controller (memcg), the rest · Maintainers: Johannes Weiner, Michal Hocko, Roman Gushchin, Shakeel Butt, Linus Torvalds

On Wed 17-06-20 21:23:05, Naresh Kamboju wrote:
On Wed, 17 Jun 2020 at 19:41, Michal Hocko [off-list ref] wrote:
quoted
[Our emails have crossed]

On Wed 17-06-20 14:57:58, Chris Down wrote:
quoted
Naresh Kamboju writes:
quoted
mkfs -t ext4 /dev/disk/by-id/ata-TOSHIBA_MG04ACA100N_Y8RQK14KF6XF
mke2fs 1.43.8 (1-Jan-2018)
Creating filesystem with 244190646 4k blocks and 61054976 inodes
Filesystem UUID: 7c380766-0ed8-41ba-a0de-3c08e78f1891
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848
Allocating group tables:    0/7453 done
Writing inode tables:    0/7453 done
Creating journal (262144 blocks): [   51.544525] under min:0 emin:0
[   51.845304] under min:0 emin:0
[   51.848738] under min:0 emin:0
[   51.858147] under min:0 emin:0
[   51.861333] under min:0 emin:0
[   51.862034] under min:0 emin:0
[   51.862442] under min:0 emin:0
[   51.862763] under min:0 emin:0
Thanks, this helps a lot. Somehow we're entering mem_cgroup_below_min even
when min/emin is 0 (which should indeed be the case if you haven't set them
in the hierarchy).

My guess is that page_counter_read(&memcg->memory) is 0, which means
mem_cgroup_below_min will return 1.
Yes this is the case because this is likely the root memcg which skips
all charges.
quoted
However, I don't know for sure why that should then result in the OOM killer
coming along. My guess is that since this memcg has 0 pages to scan anyway,
we enter premature OOM under some conditions. I don't know why we wouldn't
have hit that with the old version of mem_cgroup_protected that returned
MEMCG_PROT_* members, though.
Not really. There is likely no other memcg to reclaim from and assuming
min limit protection will result in no reclaimable memory and thus the
OOM killer.
quoted
Can you please try the patch with the `>=` checks in mem_cgroup_below_min
and mem_cgroup_below_low changed to `>`? If that fixes it, then that gives a
strong hint about what's going on here.
This would work but I believe an explicit check for the root memcg would
be easier to spot the reasoning.
May I request you to send debugging or proposed fix patches here.
I am happy to do more testing.
Sure, here is the diff to test.
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index c74a8f2323f1..6b5a31672fbe 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -392,6 +392,13 @@ static inline bool mem_cgroup_below_low(struct mem_cgroup *memcg)
 	if (mem_cgroup_disabled())
 		return false;
 
+	/*
+	 * Root memcg doesn't account charges and doesn't support
+	 * protection
+	 */
+	if (mem_cgroup_is_root(memcg))
+		return false;
+
 	return READ_ONCE(memcg->memory.elow) >=
 		page_counter_read(&memcg->memory);
 }
@@ -401,6 +408,13 @@ static inline bool mem_cgroup_below_min(struct mem_cgroup *memcg)
 	if (mem_cgroup_disabled())
 		return false;
 
+	/*
+	 * Root memcg doesn't account charges and doesn't support
+	 * protection
+	 */
+	if (mem_cgroup_is_root(memcg))
+		return false;
+
 	return READ_ONCE(memcg->memory.emin) >=
 		page_counter_read(&memcg->memory);
 }
-- 
Michal Hocko
SUSE Labs
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help