Thread (71 messages) 71 messages, 10 authors, 2015-10-27

Re: [PATCH 3/3] sched: Implement interface for cgroup unified hierarchy

From: Tejun Heo <hidden>
Date: 2015-08-24 22:19:41
Also in: lkml

Possibly related (same subject, not in this thread)

Hey,

On Mon, Aug 24, 2015 at 02:58:23PM -0700, Paul Turner wrote:
quoted
Why isn't it?  Because the programs themselves might try to override
it?
The major reasons are:

1) Isolation.  Doing everything with sched_setaffinity means that
programs can use arbitrary resources if they desire.
  1a) These restrictions need to also apply to threads created by
library code.  Which may be 3rd party.
2) Interaction between cpusets and sched_setaffinity.  For necessary
reasons, a cpuset update always overwrites all extant
sched_setaffinity values. ...And we need some cpusets for (1)....And
we need periodic updates for access to shared cores.
This is an erratic behavior on cpuset's part tho.  Nothing else
behaves this way and it's borderline buggy.
3) Virtualization of CPU ids.  (Multiple applications all binding to
core 1 is a bad thing.)
This is about who's setting the affinity, right?  As long as an agent
which knows system details sets it, which mechanism doesn't really
matter.
quoted
quoted
Let me try to restate:
  I think that we can specify the usage is specifically niche that it
will *typically* be used by higher level management daemons which
I really don't think that's the case.
Can you provide examples of non-exceptional usage in this fashion?
I heard of two use cases.  One is sytem-partitioning that you're
talking about and the other is preventing threads of the same process
from stepping on each other's toes.  There was a fancy word for the
cacheline cannibalizing behavior which shows up in those scenarios.
quoted
It's more like there are two niche sets of use cases.  If a
programmable interface or cgroups has to be picked as an exclusive
alternative, it's pretty clear that programmable interface is the way
to go.
I strongly disagree here:
  The *major obvious use* is partitioning of a system, which must act
I don't know.  Why is that more major obvious use?  This is super
duper fringe to begin with.  It's like tallying up beans.  Sure, some
may be taller than others but they're all still beans and I'm not even
sure there's a big difference between the two use cases here.
on groups of processes.  Cgroups is the only interface we have which
satisfies this today.
Well, not really.  cgroups is more convenient / better at these things
but not the only way to do it.  People have been doing isolation to
varying degrees with other mechanisms for ages.
quoted
Ditto.  Wouldn't it be better to implement something which resemables
conventional programming interface rather than contorting the
filesystem semantics?
Maybe?  This is a trade-off, some of which is built on the assumptions
we're now debating.

There is also value, cost-wise, in iterative improvement of what we
have today rather than trying to nuke it from orbit.  I do not know
which of these is the right choice, it likely depends strongly on
where we end up for sub-process interfaces.  If we do support those
I'm not sure it makes sense for them to have an entirely different API
from process-level coordination, at which point the file-system
overload is a trade-off rather than a cost.
Yeah, I understand the similarity part but don't buy that the benefit
there is big enough to introduce a kernel API which is expected to be
used by individual programs which is radically different from how
processes / threads are organized and applications interact with the
kernel.  These are a lot more grave issues and if we end up paying
some complexity from kernel side internally, so be it.
quoted
So, except for cpuset, this doesn't matter for controllers.  All
limits are hierarchical and that's it.
Well no, it still matters because I might want to lower the limit
below what children have set.
All controllers only get what their ancestors can hand down to them.
That's basic hierarchical behavior.
quoted
For cpuset, it's tricky
because a nested cgroup might end up with no intersecting execution
resource.  The kernel can't have threads which don't have any
execution resources and the solution has been assuming the resources
from higher-ups till there's some.  Application control has always
behaved the same way.  If the configured affinity becomes empty, the
scheduler ignored it.
Actually no, any configuration change that would result in this state
is rejected.

It's not possible to configure an empty cpuset once tasks are in it,
or attach tasks to an empty set.
It's also not possible to create this state using setaffinity, these
restrictions are always over-ridden by updates, even if they do not
need to be.
So, even in traditional hierarchies, this isn't true.  You can get to
no-resource config through cpu hot-unplug and cpuset currently ejects
tasks to the closest ancestor with execution resources.
quoted
Because the details on this particular issue can be hashed out in the
future?  There's nothing permanently blocking any direction that we
might choose in the future and what's working today will keep working.
Why block the whole thing which can be useful for the majority of use
cases for this particular corner case?
Because I do not think sub-process hierarchies are the corner case
that you're making them out to be for these controllers and that has
real implications for the ultimate direction of this interface.
If that's the case and we fail miserably at creating a reasonable
programming interface for that, we can always revive thread
granularity.  This is mostly a policy decision after all.
Also.  If we are making disruptive changes here, I would want to
discuss merging cpu, cpuset, and cpuacct.  What this merge looks like
depends on the above.
So, the proposed patches already merge cpu and cpuacct, at least in
appearance.  Given the kitchen-sink nature of cpuset, I don't think it
makes sense to fuse it with cpu.

Thanks.

-- 
tejun
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help