Thread (49 messages) 49 messages, 5 authors, 2016-09-06

Re: OOM killer changes

From: Michal Hocko <hidden>
Date: 2016-08-02 07:10:13
Subsystem: memory management, memory management - page allocator, the rest · Maintainers: Andrew Morton, Vlastimil Babka, Linus Torvalds

Possibly related (same subject, not in this thread)

On Mon 01-08-16 14:27:51, Ralf-Peter Rohbeck wrote:
On 01.08.2016 14:14, Ralf-Peter Rohbeck wrote:
quoted
On 01.08.2016 13:26, Michal Hocko wrote:
quoted
quoted
sdc, sdd and sde each at max speed, with a little bit of garden
variety IO
on sda and sdb.
So do I get it right that the majority of the IO is to those slower USB
disks?  If yes then does lowering the dirty_bytes to something smaller
help?
Yes, the vast majority.

I set dirty_bytes to 128MiB and started a fairly IO and memory intensive
process and the OOM killer kicked in within a few seconds.

Same with 16MiB dirty_bytes and 1MiB.

Some additional IO load from my fast subsystem is enough:

At 1MiB dirty_bytes,

find /btrfs0/ -type f -exec md5sum {} \;

was enough (where /btrfs0 is on a LVM2 LV and the PV is on sda.) It read
a few dozen files (random stuff with very mixed file sizes, none very
big) until the OOM killer kicked in.

I'll try 4.6.
With Debian 4.6.0.1 (4.6.4-1) it works: Writing to 3 USB drives and running
each of the 3 tests that triggered the OOM killer in parallel, with default
dirty settings.
Thanks for retesting! Now that it seems you are able to reproduce this,
could you do some experiments, please? First of all it would be great to
find out why we do not retry the compaction and whether it could make
some progress. The patch below will tell us the first part. Tracepoints 
can tell us the other part. Vlastimil, could you recommend some which
would give us some hints without generating way too much output?
---
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 8b3e1341b754..a10b29a918d4 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3274,6 +3274,7 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
 			*migrate_mode = MIGRATE_SYNC_LIGHT;
 			return true;
 		}
+		pr_info("XXX: compaction_failed\n");
 		return false;
 	}
 
@@ -3283,8 +3284,12 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
 	 * But do not retry if the given zonelist is not suitable for
 	 * compaction.
 	 */
-	if (compaction_withdrawn(compact_result))
-		return compaction_zonelist_suitable(ac, order, alloc_flags);
+	if (compaction_withdrawn(compact_result)) {
+		int ret = compaction_zonelist_suitable(ac, order, alloc_flags);
+		if (!ret)
+			pr_info("XXX: no zone suitable for compaction\n");
+		return ret;
+	}
 
 	/*
 	 * !costly requests are much more important than __GFP_REPEAT
@@ -3299,6 +3304,7 @@ should_compact_retry(struct alloc_context *ac, int order, int alloc_flags,
 	if (compaction_retries <= max_retries)
 		return true;
 
+	pr_info("XXX: compaction retries fail after %d\n", compaction_retries);
 	return false;
 }
 #else
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help