[PATCH] sched: support dynamiQ cluster

From: vincent.guittot@linaro.org (Vincent Guittot)
Date: 2018-04-09 07:34:25
Also in: lkml

Hi Morten,

On 6 April 2018 at 14:58, Morten Rasmussen [off-list ref] wrote:

On Thu, Apr 05, 2018 at 06:22:48PM +0200, Vincent Guittot wrote:

quoted

Hi Morten,

On 5 April 2018 at 17:46, Morten Rasmussen [off-list ref] wrote:

quoted

On Wed, Apr 04, 2018 at 03:43:17PM +0200, Vincent Guittot wrote:

quoted

On 4 April 2018 at 12:44, Valentin Schneider [off-list ref] wrote:

[snip]

quoted

What I meant was that if the task composition changes, IOW we mix "small"
tasks (e.g. periodic stuff) and "big" tasks (performance-sensitive stuff like
sysbench threads), we shouldn't assume all of those require to run on a big
CPU. The thing is, ASYM_PACKING can't make the difference between those, so

That's the 1st point where I tend to disagree: why big cores are only
for long running task and periodic stuff can't need to run on big
cores to get max compute capacity ?
You make the assumption that only long running tasks need high compute
capacity. This patch wants to always provide max compute capacity to
the system and not only long running task

There is no way we can tell if a periodic or short-running tasks
requires the compute capacity of a big core or not based on utilization
alone. The utilization can only tell us if a task could potentially use
more compute capacity, i.e. the utilization approaches the compute
capacity of its current cpu.

How we handle low utilization tasks comes down to how we define
"performance" and if we care about the cost of "performance" (e.g.
energy consumption).

Placing a low utilization task on a little cpu should always be fine
from _throughput_ point of view. As long as the cpu has spare cycles it

I disagree, throughput is not only a matter of spare cycle it's also a
matter of how fast you compute the work like with IO activity as an
example

From a cpu centric point of view it is, but I agree that from a
application/user point of view completion time might impact throughput
too. For example of if your throughput depends on how fast you can
offload work to some peripheral device (GPU for example).

However, as I said in the beginning we don't know what the task does.

I agree but that's not what you do with misfit as you assume long
running task has higher priority but not shorter running tasks

quoted

means that work isn't piling up faster than it can be processed.
However, from a _latency_ (completion time) point of view it might be a
problem, and for latency sensitive tasks I can agree that going for max
capacity might be better choice.

The misfit patches places tasks based on utilization to ensure that
tasks get the _throughput_ they need if possible. This is in line with
the placement policy we have in select_task_rq_fair() already.

We shouldn't forget that what we are discussing here is the default
behaviour when we don't have sufficient knowledge about the tasks in the
scheduler. So we are looking a reasonable middle-of-the-road policy that
doesn't kill your performance or the battery. If user-space has its own

But misfit task kills performance and might also kills your battery as
it doesn't prevent small task to run on big cores

As I said it is not perfect for all use-cases, it is middle-of-the-road
approach. But I strongly disagree that it is always a bad choice for

mmh ... I never said that it's always a bad choice; I said that it can
also easily make bad choice and kills performance and / or battery. In
fact, we can't really predict the behavior of the system as short
running tasks can be randomly put on big or little cores and random
behavior are impossible to predict and mitigate.

both energy and performance as you suggest. ASYM_PACKING doesn't
guarantee max "throughput" (by your definition) either as you may fill
up your big cores with smaller tasks leaving the big tasks behind on
little cpus.

You didn't understand the point here. Asym ensures the max throughput
to the system because it will provide the max compute capacity per
seconds to the whole system and not only to some specific tasks. You
assume that long running tasks must run on big cores and not short
running tasks. But why filling a big core with long running task and
filling a little core with short running tasks is the best choice ?
Why the opposite should not be better as long as the big core is fully
used ? The goal is to keep big CPU used whatever the type of tasks.
then, there are other mechanism like cgroup to help sorting groups of
tasks.

You try to partially do 2 things at the same time

quoted

The default behavior of the scheduler is to provide max _throughput_
not middle performance and then side activity can mitigate the power
impact like frequency scaling or like EAS which tries to optimize the
usage of energy when system is not overloaded.

That view doesn't fit very well with all activities around integrating
cpufreq and the scheduler. Frequency scaling is an important factor in
optimizing the throughput.

Here you didn't catch my point too. Pleas don't give me intention that
I don't have.
By side activity, I'm not saying that it should not consolidate the
cpufreq and other framework decisions. Scheduler is the best place to
consolidate CPU related decision. I'm just saying that it's an
additional action taken to optimize energy.
The scheduler doesn't use current frequency in task placement and load
balancing as it assumes that max throughput is available if needed and
adjust frequency to current needs

quoted

With misfit task, you
make the assumption that short task on little core is the best
placement to do even for a performance PoV.

I never said it was the best placement, I said it was a reasonable
default policy for big.LITTLE systems.

But "The primary job for the task scheduler is to deliver the highest
possible throughput with minimal latency."

quoted

It seems that you make
some power/performance assumption without using an energy model which
can make such decision. This is all the interest of EAS.

I'm trying to see the bigger picture where you seem not to. The

Thanks for helping me to get the bigger picture ;-)

ASYM_PACKING solution is incompatible with EAS. CFS has a cpu centric
view and the default policy I'm suggesting doesn't violate that view.

Sorry I don't catch the sentences above

Your own code in group_is_overloaded() follows this view as it is
utilization based and happily accepts partially utilized groups as being

But this is done for SMP system where all cores have same capacity and
to detect when tasks can get more throughput on another CPU.
ASYM_PACKING is there to add capacity awareness in the load balance
when CPUs have different capacity

fine without need to be offloaded despite you could have multiple tasks
waiting to execute.
CFS doesn't not provide any latency guarantees, but
we of course do the best we can within reason to minimize it.

Seen in the bigger picture I would consider going for max capacity for
big.LITTLE systems more aggressive than using the performance cpufreq
govenor. Nobody does the latter for battery powered devices, hence I
don't see why anyone would to go big-always for big.LITTLE systems.

And that's why EAS exists: to make battery friendly decision

quoted

opinion about performance requirements it is free to use task affinity
to control which cpu the task end up on and ensure that the task gets
max capacity always. On top of that we have had interfaces in Android
for years to specify performance requirements for task (groups) to allow
small tasks to be placed on big cpus and big task to be placed on little
cpus depending on their requirements. It is even tied into cpufreq as
well. A lot of effort has gone into Android to get this balance right.
Patrick is working hard on upstreaming some of those features.

In the bigger picture always going for max capacity is not desirable for
well-configured big.LITTLE system. You would never exploit the advantage
of the little cpus as you always use big first and only use little when
the bigs are overloaded at which point having little cpus at all makes

If i'm not wrong misfit task patchset doesn't prevent little task to
run on big core

It does not, in fact it doesn't touch small tasks at all, that is not
the point of the patch set. The point is to make sure that big tasks
don't get stuck on little cpus. IOW, a selective little to big
migration based on task utilization.

quoted

little sense. Vendors build big.LITTLE systems because they want a
better performance/energy trade-off, if they wanted max capacity always,
they would just built big-only systems.

And that's all the purpose of the EAS patchset. EAS patchset is there
to put some energy awareness in the scheduler decision. There is 2
running mode for EAS: one when there is spare cycles so tasks can be
placed to optimize energy consumption. And one when the system or part
of the system is overloaded and it goes back to default performance
mode because there is no interest for energy efficiency and we just
want to provide max performance. So the asym packing fits with this
latter mode as it provide the max compute capacity to the default mode
and doesn't break EAS as it uses the load balance which is disable by
EAS in not overloaded mode

We still care about energy even when we are overutilized. We really
don't want a vastly different placement policy depending on whether we
are overutilized or not if we can avoid it as the situation changes
frequently in many real world scenarios. With ASYM_PACKING everything
could suddenly shift to big cpus if a little cpu is suddenly
overutilized. With the misfit patches, we would detect exactly which

Not everything. The same happens with ASYM_PACKING. It doesn't blindly
put everything on "big" cores and do use parallelism too.

Regards,
Vincent

little cpu that needs help, migrate the misfit task and everything will
return to non-overutilized. That is why I said that ASYM_PACKING is
incompatible with energy-aware scheduling and we would need the misfit
patches anyway.

Morten

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help