Thread (24 messages) 24 messages, 7 authors, 2012-07-30

Re: [PATCH] cgroup: Don't drop the cgroup_mutex in cgroup_rmdir

From: Tejun Heo <hidden>
Date: 2012-07-19 16:50:53
Also in: linux-mm

On Thu, Jul 19, 2012 at 07:39:32PM +0530, Aneesh Kumar K.V wrote:
From: "Aneesh Kumar K.V" <redacted>

We dropped cgroup mutex, because of a deadlock between memcg and cpuset.
cpuset took hotplug lock followed by cgroup_mutex, where as memcg pre_destroy
did lru_add_drain_all() which took hotplug lock while already holding
cgroup_mutex. The deadlock is explained in 3fa59dfbc3b223f02c26593be69ce6fc9a940405
But dropping cgroup_mutex in cgroup_rmdir also means tasks could get
added to cgroup while we are in pre_destroy. This makes error handling in
pre_destroy complex. So move the unlock/lock to memcg pre_destroy callback.
Core cgroup will now call pre_destroy with cgroup_mutex held.

Signed-off-by: Aneesh Kumar K.V <redacted>
I generally think cgroup_mutex shouldn't be dropped across any cgroup
hierarchy changing operation and thus agree with the cgroup core
change.
 static int mem_cgroup_pre_destroy(struct cgroup *cont)
 {
+	int ret;
 	struct mem_cgroup *memcg = mem_cgroup_from_cont(cont);
 
-	return mem_cgroup_force_empty(memcg, false);
+	cgroup_unlock();
+	/*
+	 * we call lru_add_drain_all, which end up taking
+	 * mutex_lock(&cpu_hotplug.lock), But cpuset have
+	 * the reverse order. So drop the cgroup lock
+	 */
+	ret = mem_cgroup_force_empty(memcg, false);
+	cgroup_lock();
+	return ret;
 }
This reverse dependency from cpuset is the same problem Glauber
reported a while ago.  I don't know why / how cgroup_mutex got
exported to outside world but this is asking for trouble.  cgroup
mutex protects cgroup hierarchies.  There are many core subsystems
which implement cgroup controllers.  Controller callbacks for
hierarchy changing oeprations need to synchronize with the rest of the
core subsystems.  So, by design, in locking hierarchy, cgroup_mutex
has to be one of the outermost locks.  If somebody tries to grab it
from inside other core subsystem locks, there of course will be
circular locking dependencies.

So, Peter, why does cpuset mangle with cgroup_mutex?  What guarantees
does it need?  Why can't it work on "changed" notification while
caching the current css like blkcg does?

Let's please unexport cgroup_mutex.  It's moronic to expose that
outside cgroup.

Thanks.

-- 
tejun
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help