Re: [RFC PATCH v2 13/17] cgroup: Allow fine-grained controllers control in cgroup v2

[RFC PATCH v2 00/17] cgroup: Major changes to cgroup v2 core · Waiman Long <longman@redhat.com> · 2017-05-15
[RFC PATCH v2 01/17] cgroup: reorganize cgroup.procs / task write path · Waiman Long <longman@redhat.com> · 2017-05-15
[RFC PATCH v2 03/17] cgroup: introduce cgroup->proc_cgrp and threaded css_set handling · Waiman Long <longman@redhat.com> · 2017-05-15
[RFC PATCH v2 05/17] cgroup: implement cgroup v2 thread support · Waiman Long <longman@redhat.com> · 2017-05-15
[RFC PATCH v2 09/17] cgroup: Keep accurate count of tasks in each css_set · Waiman Long <longman@redhat.com> · 2017-05-15
Re: [RFC PATCH v2 09/17] cgroup: Keep accurate count of tasks in each css_set · Tejun Heo <tj@kernel.org> · 2017-05-17
Re: [RFC PATCH v2 09/17] cgroup: Keep accurate count of tasks in each css_set · Waiman Long <longman@redhat.com> · 2017-05-18
[RFC PATCH v2 08/17] cgroup: Move debug cgroup to its own file · Waiman Long <longman@redhat.com> · 2017-05-15
Re: [RFC PATCH v2 08/17] cgroup: Move debug cgroup to its own file · Tejun Heo <tj@kernel.org> · 2017-05-17
Re: [RFC PATCH v2 08/17] cgroup: Move debug cgroup to its own file · Waiman Long <longman@redhat.com> · 2017-05-18
Re: [RFC PATCH v2 08/17] cgroup: Move debug cgroup to its own file · Waiman Long <longman@redhat.com> · 2017-05-18
Re: [RFC PATCH v2 08/17] cgroup: Move debug cgroup to its own file · Tejun Heo <tj@kernel.org> · 2017-05-19
Re: [RFC PATCH v2 08/17] cgroup: Move debug cgroup to its own file · Waiman Long <longman@redhat.com> · 2017-05-19
Re: [RFC PATCH v2 08/17] cgroup: Move debug cgroup to its own file · Tejun Heo <tj@kernel.org> · 2017-05-19
[RFC PATCH v2 07/17] cgroup: Prevent kill_css() from being called more than once · Waiman Long <longman@redhat.com> · 2017-05-15
Re: [RFC PATCH v2 07/17] cgroup: Prevent kill_css() from being called more than once · Tejun Heo <tj@kernel.org> · 2017-05-17
Re: [RFC PATCH v2 07/17] cgroup: Prevent kill_css() from being called more than once · Waiman Long <longman@redhat.com> · 2017-05-17
Re: [RFC PATCH v2 07/17] cgroup: Prevent kill_css() from being called more than once · Tejun Heo <tj@kernel.org> · 2017-05-17
[RFC PATCH v2 12/17] cgroup: Remove cgroup v2 no internal process constraint · Waiman Long <longman@redhat.com> · 2017-05-15
Re: [RFC PATCH v2 12/17] cgroup: Remove cgroup v2 no internal process constraint · Tejun Heo <tj@kernel.org> · 2017-05-19
Re: [RFC PATCH v2 12/17] cgroup: Remove cgroup v2 no internal process constraint · Mike Galbraith <hidden> · 2017-05-20
Re: [RFC PATCH v2 12/17] cgroup: Remove cgroup v2 no internal process constraint · Tejun Heo <tj@kernel.org> · 2017-05-24
Re: [RFC PATCH v2 12/17] cgroup: Remove cgroup v2 no internal process constraint · Waiman Long <longman@redhat.com> · 2017-05-22
Re: [RFC PATCH v2 12/17] cgroup: Remove cgroup v2 no internal process constraint · Tejun Heo <tj@kernel.org> · 2017-05-24
Re: [RFC PATCH v2 12/17] cgroup: Remove cgroup v2 no internal process constraint · Waiman Long <longman@redhat.com> · 2017-05-24
Re: [RFC PATCH v2 12/17] cgroup: Remove cgroup v2 no internal process constraint · Waiman Long <longman@redhat.com> · 2017-05-24
[RFC PATCH v2 15/17] sched: Misc preps for cgroup unified hierarchy interface · Waiman Long <longman@redhat.com> · 2017-05-15
[RFC PATCH v2 16/17] sched: Implement interface for cgroup unified hierarchy · Waiman Long <longman@redhat.com> · 2017-05-15
[RFC PATCH v2 17/17] sched: Make cpu/cpuacct threaded controllers · Waiman Long <longman@redhat.com> · 2017-05-15
[RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics · Waiman Long <longman@redhat.com> · 2017-05-15
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics · Tejun Heo <tj@kernel.org> · 2017-05-17
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics · Waiman Long <longman@redhat.com> · 2017-05-18
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics · Tejun Heo <tj@kernel.org> · 2017-05-19
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics · Tejun Heo <tj@kernel.org> · 2017-05-19
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics · Waiman Long <longman@redhat.com> · 2017-05-22
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics · Waiman Long <longman@redhat.com> · 2017-05-22
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics · Tejun Heo <tj@kernel.org> · 2017-05-24
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics · Waiman Long <longman@redhat.com> · 2017-05-24
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics · Tejun Heo <tj@kernel.org> · 2017-05-24
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics · Tejun Heo <tj@kernel.org> · 2017-06-01
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics · Peter Zijlstra <peterz@infradead.org> · 2017-06-01
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics · Tejun Heo <tj@kernel.org> · 2017-06-01
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics · Waiman Long <longman@redhat.com> · 2017-06-01
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics · Tejun Heo <tj@kernel.org> · 2017-06-01
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics · Waiman Long <longman@redhat.com> · 2017-06-01
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics · Tejun Heo <tj@kernel.org> · 2017-06-01
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics · Waiman Long <longman@redhat.com> · 2017-06-01
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics · Tejun Heo <tj@kernel.org> · 2017-06-01
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics · Waiman Long <longman@redhat.com> · 2017-06-01
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics · Tejun Heo <tj@kernel.org> · 2017-06-01
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics · Waiman Long <longman@redhat.com> · 2017-06-02
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics · Tejun Heo <tj@kernel.org> · 2017-06-03
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics · Waiman Long <longman@redhat.com> · 2017-06-01
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics · Waiman Long <longman@redhat.com> · 2017-06-01
Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics · Waiman Long <longman@redhat.com> · 2017-06-01
[RFC PATCH v2 14/17] cgroup: Enable printing of v2 controllers' cgroup hierarchy · Waiman Long <longman@redhat.com> · 2017-05-15
[RFC PATCH v2 13/17] cgroup: Allow fine-grained controllers control in cgroup v2 · Waiman Long <longman@redhat.com> · 2017-05-15
Re: [RFC PATCH v2 13/17] cgroup: Allow fine-grained controllers control in cgroup v2 · Tejun Heo <tj@kernel.org> · 2017-05-19
Re: [RFC PATCH v2 13/17] cgroup: Allow fine-grained controllers control in cgroup v2 · Waiman Long <longman@redhat.com> · 2017-05-19
Re: [RFC PATCH v2 13/17] cgroup: Allow fine-grained controllers control in cgroup v2 · Tejun Heo <tj@kernel.org> · 2017-05-24
Re: [RFC PATCH v2 13/17] cgroup: Allow fine-grained controllers control in cgroup v2 · Waiman Long <longman@redhat.com> · 2017-05-24
Re: [RFC PATCH v2 13/17] cgroup: Allow fine-grained controllers control in cgroup v2 · Tejun Heo <tj@kernel.org> · 2017-05-24
Re: [RFC PATCH v2 13/17] cgroup: Allow fine-grained controllers control in cgroup v2 · Waiman Long <longman@redhat.com> · 2017-05-24
[RFC PATCH v2 10/17] cgroup: Make debug cgroup support v2 and thread mode · Waiman Long <longman@redhat.com> · 2017-05-15
Re: [RFC PATCH v2 10/17] cgroup: Make debug cgroup support v2 and thread mode · Tejun Heo <tj@kernel.org> · 2017-05-17
Re: [RFC PATCH v2 10/17] cgroup: Make debug cgroup support v2 and thread mode · Waiman Long <longman@redhat.com> · 2017-05-18
[RFC PATCH v2 06/17] cgroup: Fix reference counting bug in cgroup_procs_write() · Waiman Long <longman@redhat.com> · 2017-05-15
Re: [RFC PATCH v2 06/17] cgroup: Fix reference counting bug in cgroup_procs_write() · Tejun Heo <tj@kernel.org> · 2017-05-17
[RFC PATCH v2 04/17] cgroup: implement CSS_TASK_ITER_THREADED · Waiman Long <longman@redhat.com> · 2017-05-15
[RFC PATCH v2 02/17] cgroup: add @flags to css_task_iter_start() and implement CSS_TASK_ITER_PROCS · Waiman Long <longman@redhat.com> · 2017-05-15

From: Waiman Long <longman@redhat.com>
Date: 2017-05-19 21:20:11
Also in: linux-mm, lkml

On 05/19/2017 04:55 PM, Tejun Heo wrote:

Hello, Waiman.

On Mon, May 15, 2017 at 09:34:12AM -0400, Waiman Long wrote:

quoted

For cgroup v1, different controllers can be binded to different cgroup
hierarchies optimized for their own use cases. That is not currently
the case for cgroup v2 where combining all these controllers into
the same hierarchy will probably require more levels than is needed
by each individual controller.

By not enabling a controller in a cgroup and its descendants, we can
effectively trim the hierarchy as seen by a controller from the leafs
up. However, there is currently no way to compress the hierarchy in
the intermediate levels.

This patch implements a fine-grained mechanism to allow a controller to
skip some intermediate levels in a hierarchy and effectively flatten
the hierarchy as seen by that controller.

Controllers can now be directly enabled or disabled in a cgroup
by writing to the "cgroup.controllers" file.  The special prefix
'#' with the controller name is used to set that controller in
pass-through mode.  In that mode, the controller is disabled for that
cgroup but it allows its children to have that controller enabled or
in pass-through mode again.

With this change, each controller can now have a unique view of their
virtual process hierarchy that can be quite different from other
controllers.  We now have the freedom and flexibility to create the
right hierarchy for each controller to suit their own needs without
performance loss when compared with cgroup v1.

I can see the appeal but this needs at least more refinements.

This breaks the invariant that in a cgroup its resource control knobs
control distribution of resources from its parent.  IOW, the resource
control knobs of a cgroup always belong to the parent.  This is also
reflected in how delegation is done.  The delegatee assumes ownership
of the cgroup itself and the ability to manage sub-cgroups but doesn't
get the ownership of the resource control knobs as otherwise the
parent would lose control over how it distributes its resources.

One twist that I am thinking is to have a controller enabled by the
parent in subtree_control, but then allow the child to either disable it
or set it in pass-through mode by writing to controllers file. IOW, a
child cannot enable a controller without parent's permission. Once a
child has permission, it can do whatever it wants. A parent cannot force
a child to have a controller enabled.

Another aspect is that most controllers aren't that sensitive to
nesting several levels.  Expensive operations can be and already are
aggregated and the performance overhead of several levels of nesting
barely shows up.  Skipping levels can be an interesting optimization
approach and we can definitely support from the core side; however,
it'd be a lot nicer if we could do that optimization transparently
(e.g. CPU can skip multi level queueing if there usually is only one
item at some levels).

The trend that I am seeing is that the total number of controllers is
going to grow over time. New controllers may be sensitive to the level
of nesting like the cpu controller. I am also thinking about how systemd
is using the cgroup filesystem for task classification purpose without
any controller attached to it. With this scheme, we can accommodate all
the different needs without using different cgroup filesystems.

Hmm... that said, if we can fix the delegation issue in a not-too-ugly
way, why not?  I wonder whether we can still keep the resource control
knobs attached to the parent and skip in the middle.  Topology-wise,
that'd make more sense too.

Let me know how you think about my proposal above.

Cheers,
Longma

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help