Thread (40 messages) 40 messages, 8 authors, 2012-07-19

Re: [rfc][patch 3/3] mm, memcg: introduce own oom handler to iterate only over its own threads

From: David Rientjes <rientjes@google.com>
Date: 2012-06-27 05:35:42
Also in: cgroups

On Tue, 26 Jun 2012, David Rientjes wrote:
It's still not a perfect solution for the above reason.  We need 
tasklist_lock for oom_kill_process() for a few reasons:

 (1) if /proc/sys/vm/oom_dump_tasks is enabled, which is the default, 
     to iterate the tasklist

 (2) to iterate the selected process's children, and

 (3) to iterate the tasklist to kill all other processes sharing the 
     same memory.

I'm hoping we can avoid taking tasklist_lock entirely for memcg ooms to 
avoid the starvation problem at all.  We definitely still need to do (3) 
to avoid mm->mmap_sem deadlock if another thread sharing the same memory 
is holding the semaphore trying to allocate memory and waiting for current 
to exit, which needs the semaphore itself.  That can be done with 
rcu_read_lock(), however, and doesn't require tasklist_lock.

(1) can be done with rcu_read_lock() as well but I'm wondering if there 
would be a significant advantage doing this by a cgroup iterator as well.  
It may not be worth it just for the sanity of the code.

We can do (2) if we change to list_for_each_entry_rcu().
It turns out that task->children is not an rcu-protected list so this 
doesn't work.  Both (1) and (3) can be accomplished with 
rcu_read_{lock,unlock}() that can nest inside the tasklist_lock for the 
global oom killer.  (We could even split the global oom killer tasklist 
locking and optimize it seperately from this patchset.)

So we have a couple of options:

 - allow oom_kill_process() to do

	if (memcg)
		read_lock(&tasklist_lock);
	...
	if (memcg)
		read_unlock(&tasklist_lock);

   around the iteration over the victim's children.  This should solve the 
   issue since any other iteration over the entire tasklist would have 
   triggered the same starvation if it were that bad, or

 - suppress the iteration for memcg ooms and just kill the parent instead.

Comments?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help