Thread (45 messages) 45 messages, 4 authors, 2011-03-18

Re: [patch] memcg: add oom killer delay

From: KAMEZAWA Hiroyuki <hidden>
Date: 2011-02-24 00:20:08

On Wed, 23 Feb 2011 15:08:50 -0800
Andrew Morton [off-list ref] wrote:
On Wed, 9 Feb 2011 14:19:50 -0800 (PST)
David Rientjes [off-list ref] wrote:
quoted
Completely disabling the oom killer for a memcg is problematic if
userspace is unable to address the condition itself, usually because it
is unresponsive.  This scenario creates a memcg deadlock: tasks are
sitting in TASK_KILLABLE waiting for the limit to be increased, a task to
exit or move, or the oom killer reenabled and userspace is unable to do
so.

An additional possible use case is to defer oom killing within a memcg
for a set period of time, probably to prevent unnecessary kills due to
temporary memory spikes, before allowing the kernel to handle the
condition.

This patch adds an oom killer delay so that a memcg may be configured to
wait at least a pre-defined number of milliseconds before calling the oom
killer.  If the oom condition persists for this number of milliseconds,
the oom killer will be called the next time the memory controller
attempts to charge a page (and memory.oom_control is set to 0).  This
allows userspace to have a short period of time to respond to the
condition before deferring to the kernel to kill a task.

Admins may set the oom killer delay using the new interface:

	# echo 60000 > memory.oom_delay_millisecs

This will defer oom killing to the kernel only after 60 seconds has
elapsed by putting the task to sleep for 60 seconds.  When setting
memory.oom_delay_millisecs, all pending delays have their charges retried
and, if necessary, the new delay is then enforced.

The delay is cleared the first time the memcg is oom to avoid unnecessary
waiting when userspace is unresponsive for future oom conditions.  It may
be set again using the above interface to enforce a delay on the next
oom.

When a memory.oom_delay_millisecs is set for a cgroup, it is propagated
to all children memcg as well and is inherited when a new memcg is
created.
Your patch still stinks!

If userspace can't handle a disabled oom-killer then userspace
shouldn't have disabled the oom-killer.

How do we fix this properly?

A little birdie tells me that the offending userspace oom handler is
running in a separate memcg and is not itself running out of memory. 
The problem is that the userspace oom handler is also taking peeks into
processes which are in the stressed memcg and is getting stuck on
mmap_sem in the procfs reads.  Correct?
Hmm, I think memcg's oom-kill just happens under down_read(mmap_sem). 
And all tasks, which is under oom, will be in wait-queue.


Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help