Thread (36 messages) 36 messages, 6 authors, 2014-11-24

[PATCH v6 5/6] sched: replace capacity_factor by usage

From: Morten Rasmussen <hidden>
Date: 2014-10-03 09:35:58
Also in: lkml

On Fri, Oct 03, 2014 at 08:24:23AM +0100, Vincent Guittot wrote:
On 2 October 2014 18:57, Morten Rasmussen [off-list ref] wrote:
quoted
On Tue, Sep 23, 2014 at 05:08:04PM +0100, Vincent Guittot wrote:
quoted
Below are two examples to illustrate the problem that this patch solves:

1 - capacity_factor makes the assumption that max capacity of a CPU is
SCHED_CAPACITY_SCALE and the load of a thread is always is
SCHED_LOAD_SCALE. It compares the output of these figures with the sum
of nr_running to decide if a group is overloaded or not.

But if the default capacity of a CPU is less than SCHED_CAPACITY_SCALE
(640 as an example), a group of 3 CPUS will have a max capacity_factor
of 2 ( div_round_closest(3x640/1024) = 2) which means that it will be
seen as overloaded if we have only one task per CPU.
I did some testing on TC2 which has the setup you describe above on the
A7 cluster when the clock-frequency property is set in DT. The two A15s
have max capacities above 1024. When using sysbench with five threads I
still get three tasks on the two A15s and two tasks on the three A7
leaving one cpu idle (on average).

Using cpu utilization (usage) does correctly identify the A15 cluster as
overloaded and it even gets as far as selecting the A15 running two
tasks as the source cpu in load_balance(). However, load_balance() bails
out without pulling the task due to calculate_imbalance() returning a
zero imbalance. calculate_imbalance() bails out just before the hunk you
changed due to comparison of the sched_group avg_loads. sgs->avg_load is
basically the sum of load in the group divided by its capacity. Since
load isn't scaled the avg_load of the overloaded A15 group is slightly
_lower_ than the partially idle A7 group. Hence calculate_imbalance()
bails out, which isn't what we want.

I think we need to have a closer look at the imbalance calculation code
and any other users of sgs->avg_load to get rid of all code making
assumptions about cpu capacity.
We already had this discussion during the review of a previous version
https://lkml.org/lkml/2014/6/3/422
and my answer has not changed; This patch is necessary to solve the 1
task per CPU issue of the HMP system but is not enough. I have a patch
for solving the imbalance calculation issue and i have planned to send
it once this patch will be in a good shape for being accepted by
Peter.

I don't want to mix this patch and the next one because there are
addressing different problem: this one is how evaluate the capacity of
a system and detect when a group is overloaded and the next one will
handle the case when the balance of the system can't rely on the
average load figures of the group because we have a significant
capacity difference between groups. Not that i haven't specifically
mentioned HMP for the last patch because SMP system can also take
advantage of it
You do mention 'big' and 'little' cores in your commit message and quote
example numbers with are identical to the cpu capacities for TC2. That
lead me to believe that this patch would address the issues we see for
HMP systems. But that is clearly wrong. I would suggest that you drop
mentioning big and little cores and stick to only describing cpu
capacity reductions due to rt tasks and irq to avoid any confusion about
the purpose of the patch. Maybe explicitly say that non-default cpu
capacities (capacity_orig) isn't addressed yet.

I think the two problems you describe are very much related. This patch
set is half the solution of the HMP balancing problem. We just need the
last bit for avg_load and then we can add scale-invariance on top.

IMHO, it would be good to have all the bits and pieces for cpu capacity
scaling and scaling of per-entity load-tracking on the table so we fit
things together. We only have patches for parts of the solution posted
so far.

Morten
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help