Thread (1 message) 1 message, 1 author, 2015-05-27

Re: [PATCH 1/2] connector: add cgroup release event report to proc connector

From: Dimitri John Ledkov <hidden>
Date: 2015-05-27 12:37:14
Also in: cgroups

On 27 May 2015 at 12:22, Zefan Li [off-list ref] wrote:
On 2015/5/27 6:07, Dimitri John Ledkov wrote:
quoted
Add a kernel API to send a proc connector notification that a cgroup
has become empty. A userspace daemon can then act upon such
information, and usually clean-up and remove such a group as it's no
longer needed.

Currently there are two other ways (one for current & one for unified
cgroups) to receive such notifications, but they either involve
spawning userspace helper or monitoring a lot of files. This is a
firehose of all such events instead from a single place.

In the current cgroups structure the way to get notifications is by
enabling `release_agent' and setting `notify_on_release' for a given
cgroup hierarchy. This will then spawn userspace helper with removed
cgroup as an argument. It has been acknowledged that this is
expensive, especially in the exit-heavy workloads. In userspace this
is currently used by systemd and CGmanager that I know of, both of
agents establish connection to the long running daemon and pass the
message to it. As a courtesy to other processes, such an event is
sometimes forwarded further on, e.g. systemd forwards it to the system
DBus.

In the future/unified cgroups structure support for `release_agent' is
removed, without a direct replacement. However, there is a new
`cgroup.populated' file exposed that recursively reports if there are
any tasks in a given cgroup hierarchy. It's a very good flag to
quickly/lazily scan for empty things, however one would need to
establish inotify watch on each and every cgroup.populated file at
cgroup setup time (ideally before any pids enter said cgroup). Thus
again anybody else, but the original creator of a given cgroup, has a
chance to reliably monitor cgroup becoming empty (since there is no
reliable recursive inotify watch).

Hence, the addition to the proc connector firehose. Multiple things,
albeit with a CAP_NET_ADMIN in the init pid/user namespace), could
connect and monitor cgroups release notifications. In a way, this
repeats udev history, at first it was a userspace helper, which later
became a netlink socket. And I hope, that proc connector is a
naturally good fit for this notification type.

For precisely when cgroups should emit this event, see next patch
against kernel/cgroup.c.
We really don't want yet another way for cgroup notification.
we do have multiple information sources for similar events in other
places... e.g. fork events can be tracked with ptrace and with
proc-connector, ditto other things.
Systemd is happy with this cgroup.populated interface. Do you have any
real use case in mind that can't be satisfied with inotify watch?
cgroup.populated is not implemented in systemd and would require a lot
of inotify watches. Also it's only set on the unified structure and
not exposed on the current one.

Also it will not allow anybody else to establish notify watch in a
timely manner. Thus anyone external to the cgroups creator will not be
able to monitor cgroup.populated at the right time. With
proc_connector I was thinking processes entering cgroups would be
useful events as well, but I don't have a use-case for them yet thus
I'm not sure how the event should look like.

Would cgroup.populated be exposed on the legacy cgroup hierchy? At the
moment I see about ~20ms of my ~200ms boot wasted on spawning the
cgroups agent and I would like to get rid of that as soon as possible.
This patch solves it for me. ( i have a matching one to connect to
proc connector and then feed notifications to systemd via systemd's
private api end-point )

Exposing cgroup.populated irrespective of the cgroup mount options
would be great, but would result in many watches being established
awaiting for a once in a lifecycle condition of a cgroup. Imho this is
wasteful, but nonetheless will be much better than spawning the agent.

Would a patch that exposes cgroup.populated on legacy cgroup structure
be accepted? It is forward-compatible afterall... or no?

-- 
Regards,

Dimitri.
Pura Vida!

https://clearlinux.org
Open Source Technology Center
Intel Corporation (UK) Ltd. - Co. Reg. #1134945 - Pipers Way, Swindon SN3 1RJ.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help