Re: [Documentation] State of CPU controller in cgroup v2

From: Mike Galbraith <hidden>
Date: 2016-08-11 06:25:12
Also in: linux-api, lkml

On Wed, 2016-08-10 at 18:09 -0400, Johannes Weiner wrote:

The complete lack of cohesiveness between v1 controllers prevents us
from implementing even the most fundamental resource control that
cloud fleets like Google's and Facebook's are facing, such as
controlling buffered IO; attributing CPU cycles spent receiving
packets, reclaiming memory in kswapd, encrypting the disk; attributing
swap IO etc. That's why cgroup2 runs a tighter ship when it comes to
the controllers: to make something much bigger work.

Where is the gun wielding thug forcing people to place tasks where v2
now explicitly forbids them?

Agreeing on something - in this case a common controller model - is
necessarily going to take away some flexibility from how you approach
a problem. What matters is whether the problem can still be solved.

What annoys me about this more than the seemingly gratuitous breakage
is that the decision is passed to third parties who have nothing to
lose, and have done quite a bit of breaking lately.

This argument that cgroup2 is not backward compatible is laughable.

Fine, you're entitled to your sense of humor.  I have one to, I find it
laughable that threaded applications can only sit there like a lump of
mud simply because they share more than applications written as a
gaggle of tasks.  "Threads are like.. so yesterday, the future belongs
to the process" tickles my funny-bone.  Whatever, to each his own.

...

Lastly, again - and this was the whole point of this document - the
changes in cgroup2 are not gratuitous. They are driven by fundamental
resource control problems faced by more comprehensive applications of
cgroup. On the other hand, the opposition here mainly seems to be the
inconvenience of switching some specialized setups from a v1-oriented
way of solving a problem to a v2-oriented way.

[ That, and a disturbing number of emotional outbursts against
  systemd, which has nothing to do with any of this. ]

It's a really myopic line of argument.

And I think the myopia is on the other side of my monitor, whatever.

That being said, let's go through your points:

quoted

Priority and affinity are not process wide attributes, never have
been, but you're insisting that so they must become for the sake of
progress.

Not really.

It's just questionable whether the cgroup interface is the best way to
manipulate these attributes, or whether existing interfaces like
setpriority() and sched_setaffinity() should be extended to manipulate
groups, like the rgroup proposal does. The problems of using the
cgroup interface for this are extensively documented, including in the
email you were replying to.

quoted

I mentioned a real world case of a thread pool servicing customer
accounts by doing something quite sane: hop into an account (cgroup),
do work therein, send bean count off to the $$ department, wash, rinse
repeat.  That's real world users making real world cash registers go ka
-ching so real world people can pay their real world bills.

Sure, but you're implying that this is the only way to run this real
world cash register.

I implied no such thing.  Of course it can be done differently, all
they have to do is rip out these archaic thread thingies.

Apologies for dripping sarcasm all over your monitor, but this annoys
me far more that it should any casual user of cgroups.  Perhaps I
shouldn't care about the users (suse customers) who will step in this
eventually, but I do.

I'm not going down the rabbit hole again of arguing against an
incomplete case description. Scale matters. Number of workers
matter. Amount of work each thread does matters to evaluate
transaction overhead. Task migration is an expensive operation etc.

quoted

I also mentioned breakage to cpusets: given exclusive set A and
exclusive subset B therein, there is one and only one spot where
affinity A exists... at the to be forbidden junction of A and B.

Again, a means to an end rather than a goal

I don't believe I described a means to an end, I believe I described
affinity bits going missing.

 - and a particularly
suspicious one at that: why would a cgroup need to tell its *siblings*
which cpus/nodes in cannot use? In the hierarchical model, it's
clearly the task of the ancestor to allocate the resources downward.

More details would be needed to properly discuss what we are trying to
accomplish here.

quoted

As with the thread pool, process granularity makes it impossible for
any threaded application affinity to be managed via cpusets, such as
say stuffing realtime critical threads into a shielded cpuset, mundane
threads into another.  There are any number of affinity usages that
will break.

Ditto. It's not obvious why this needs to be the cgroup interface and
couldn't instead be solved with extending sched_setaffinity() - again
weighing that against the power of the common controller model that
could be preserved this way.

Wow.  Well sure, anything that becomes broken can be replaced by
something else.  Hell, people can just stop using cgroups entirely, and
the way issues become non-issues with the wave of a hand makes me
suspect that some users are going to be forced to do just that.

	-Mike

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help