Re: [RFC] cgroup TODOs

From: Li Zefan <hidden>
Date: 2012-09-14 09:12:59
Also in: lkml

Possibly related (same subject, not in this thread)

2012-09-21 · Re: [RFC] cgroup TODOs · Tejun Heo <tj@kernel.org>
2012-09-20 · Re: [RFC] cgroup TODOs · Andy Lutomirski <luto@amacapital.net>
2012-09-20 · Re: [RFC] cgroup TODOs · Tejun Heo <tj@kernel.org>
2012-09-20 · Re: [RFC] cgroup TODOs · Andy Lutomirski <luto@amacapital.net>
2012-09-19 · Re: [RFC] cgroup TODOs · Michal Hocko <hidden>

quoted

2. memcg's __DEPRECATED_clear_css_refs

  This is a remnant of another weird design decision of requiring
  synchronous draining of refcnts on cgroup removal and allowing
  subsystems to veto cgroup removal - what's the userspace supposed to
  do afterwards?  Note that this also hinders co-mounting different
  controllers.

  The behavior could be useful for development and debugging but it
  unnecessarily interlocks userland visible behavior with in-kernel
  implementation details.  To me, it seems outright wrong (either
  implement proper severing semantics in the controller or do full
  refcnting) and disallows, for example, lazy drain of caching refs.
  Also, it complicates the removal path with try / commit / revert
  logic which has never been fully correct since the beginning.

  Currently, the only left user is memcg.

  Solution:

  * Update memcg->pre_destroy() such that it never fails.

  * Drop __DEPRECATED_clear_css_refs and all related logic.
    Convert pre_destroy() to return void.

  Who:

  KAMEZAWA, Michal, PLEASE.  I will make __DEPRECATED_clear_css_refs
  trigger WARN sooner or later.  Let's please get this settled.

3. cgroup_mutex usage outside cgroup core

  This is another thing which is simply broken.  Given the way cgroup
  is structured and used, nesting cgroup_mutex inside any other
  commonly used lock simply doesn't work - it's held while invoking
  controller callbacks which then interact and synchronize with
  various core subsystems.

  There are currently three external cgroup_mutex users - cpuset,
  memcontrol and cgroup_freezer.

  Solution:

  Well, we should just stop doing it - use a separate nested lock
  (which seems possible for cgroup_freezer) or track and mange task
  in/egress some other way.

  Who:

  I'll do the cgroup_freezer.  I'm hoping PeterZ or someone who's
  familiar with the code base takes care of cpuset.  Michal, can you
  please take care of memcg?

I think this is a pressing problem, yes, but not the only problem with
cgroup lock. Even if we restrict its usage to cgroup core, we still can
call cgroup functions, which will lock. And then we gain nothing.

Agreed. The biggest issue in cpuset is if hotplug makes a cpuset's cpulist
empty the tasks in it will be moved to an ancestor cgroup, which requires
holding cgroup lock. We have to either change cpuset's behavior or eliminate
the global lock.

And the problem is that people need to lock. cgroup_lock is needed
because the data you are accessing is protected by it. The way I see it,
it is incredible how we were able to revive the BKL in the form of
cgroup_lock after we finally manage to successfully get rid of it!

We should just start to do a more fine grained locking of data, instead
of "stop the world, cgroup just started!". If we do that, the problem
you are trying to address here will even cease to exist.

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help