[PATCH v5 08/12] sched: move cfs task on a CPU with higher capacity

[PATCH v5 00/12] sched: consolidation of cpu_capacity · Vincent Guittot <vincent.guittot@linaro.org> · 2014-08-26
[PATCH v5 01/12] sched: fix imbalance flag reset · Vincent Guittot <vincent.guittot@linaro.org> · 2014-08-26
[PATCH v5 03/12] sched: fix avg_load computation · Vincent Guittot <vincent.guittot@linaro.org> · 2014-08-26
Re: [PATCH v5 03/12] sched: fix avg_load computation · Preeti U Murthy <hidden> · 2014-08-30
Re: [PATCH v5 03/12] sched: fix avg_load computation · Vincent Guittot <vincent.guittot@linaro.org> · 2014-09-03
Re: [PATCH v5 03/12] sched: fix avg_load computation · Tim Chen <hidden> · 2014-09-03
Re: [PATCH v5 03/12] sched: fix avg_load computation · Vincent Guittot <vincent.guittot@linaro.org> · 2014-09-04
Re: [PATCH v5 03/12] sched: fix avg_load computation · Tim Chen <hidden> · 2014-09-04
Re: [PATCH v5 03/12] sched: fix avg_load computation · Preeti U Murthy <hidden> · 2014-09-05
[PATCH v5 04/12] sched: Allow all archs to set the capacity_orig · Vincent Guittot <vincent.guittot@linaro.org> · 2014-08-26
Re: [PATCH v5 04/12] sched: Allow all archs to set the capacity_orig · Kamalesh Babulal <hidden> · 2014-08-27
Re: [PATCH v5 04/12] sched: Allow all archs to set the capacity_orig · Preeti U Murthy <hidden> · 2014-08-30
Re: [PATCH v5 04/12] sched: Allow all archs to set the capacity_orig · Vincent Guittot <vincent.guittot@linaro.org> · 2014-09-01
Re: [PATCH v5 04/12] sched: Allow all archs to set the capacity_orig · Preeti U Murthy <hidden> · 2014-09-03
Re: [PATCH v5 04/12] sched: Allow all archs to set the capacity_orig · Peter Zijlstra <peterz@infradead.org> · 2014-09-10
Re: [PATCH v5 04/12] sched: Allow all archs to set the capacity_orig · Vincent Guittot <vincent.guittot@linaro.org> · 2014-09-10
Re: [PATCH v5 04/12] sched: Allow all archs to set the capacity_orig · Preeti U Murthy <hidden> · 2014-09-11
[PATCH v5 07/12] sched: test the cpu's capacity in wake affine · Vincent Guittot <vincent.guittot@linaro.org> · 2014-08-26
Re: [PATCH v5 07/12] sched: test the cpu's capacity in wake affine · Peter Zijlstra <peterz@infradead.org> · 2014-09-10
[PATCH v5 08/12] sched: move cfs task on a CPU with higher capacity · Vincent Guittot <vincent.guittot@linaro.org> · 2014-08-26
Re: [PATCH v5 08/12] sched: move cfs task on a CPU with higher capacity · Preeti U Murthy <hidden> · 2014-08-30
Re: [PATCH v5 08/12] sched: move cfs task on a CPU with higher capacity · Vincent Guittot <vincent.guittot@linaro.org> · 2014-09-01
Re: [PATCH v5 08/12] sched: move cfs task on a CPU with higher capacity · Preeti U Murthy <hidden> · 2014-09-03
Re: [PATCH v5 08/12] sched: move cfs task on a CPU with higher capacity · Vincent Guittot <vincent.guittot@linaro.org> · 2014-09-03
Re: [PATCH v5 08/12] sched: move cfs task on a CPU with higher capacity · Preeti U Murthy <hidden> · 2014-09-03
Re: [PATCH v5 08/12] sched: move cfs task on a CPU with higher capacity · Vincent Guittot <vincent.guittot@linaro.org> · 2014-09-03
Re: [PATCH v5 08/12] sched: move cfs task on a CPU with higher capacity · Peter Zijlstra <peterz@infradead.org> · 2014-09-11
Re: [PATCH v5 08/12] sched: move cfs task on a CPU with higher capacity · Preeti U Murthy <hidden> · 2014-09-05
Re: [PATCH v5 08/12] sched: move cfs task on a CPU with higher capacity · Vincent Guittot <vincent.guittot@linaro.org> · 2014-09-05
Re: [PATCH v5 08/12] sched: move cfs task on a CPU with higher capacity · Peter Zijlstra <peterz@infradead.org> · 2014-09-11
Re: [PATCH v5 08/12] sched: move cfs task on a CPU with higher capacity · Vincent Guittot <vincent.guittot@linaro.org> · 2014-09-11
Re: [PATCH v5 08/12] sched: move cfs task on a CPU with higher capacity · Peter Zijlstra <peterz@infradead.org> · 2014-09-11
Re: [PATCH v5 08/12] sched: move cfs task on a CPU with higher capacity · Vincent Guittot <vincent.guittot@linaro.org> · 2014-09-11
Re: [PATCH v5 08/12] sched: move cfs task on a CPU with higher capacity · Peter Zijlstra <peterz@infradead.org> · 2014-09-11
[PATCH v5 12/12] sched: add SD_PREFER_SIBLING for SMT level · Vincent Guittot <vincent.guittot@linaro.org> · 2014-08-26
[PATCH v5 11/12] sched: replace capacity_factor by utilization · Vincent Guittot <vincent.guittot@linaro.org> · 2014-08-26
Re: [PATCH v5 11/12] sched: replace capacity_factor by utilization · Peter Zijlstra <peterz@infradead.org> · 2014-09-11
Re: [PATCH v5 11/12] sched: replace capacity_factor by utilization · Peter Zijlstra <peterz@infradead.org> · 2014-09-11
Re: [PATCH v5 11/12] sched: replace capacity_factor by utilization · Vincent Guittot <vincent.guittot@linaro.org> · 2014-09-11
Re: [PATCH v5 11/12] sched: replace capacity_factor by utilization · Peter Zijlstra <peterz@infradead.org> · 2014-09-14
Re: [PATCH v5 11/12] sched: replace capacity_factor by utilization · Peter Zijlstra <peterz@infradead.org> · 2014-09-14
Re: [PATCH v5 11/12] sched: replace capacity_factor by utilization · Peter Zijlstra <peterz@infradead.org> · 2014-09-15
Re: [PATCH v5 11/12] sched: replace capacity_factor by utilization · Nicolas Pitre <hidden> · 2014-09-15
Re: [PATCH v5 11/12] sched: replace capacity_factor by utilization · Peter Zijlstra <peterz@infradead.org> · 2014-09-16
Re: [PATCH v5 11/12] sched: replace capacity_factor by utilization · Morten Rasmussen <hidden> · 2014-09-17
Re: [PATCH v5 11/12] sched: replace capacity_factor by utilization · Morten Rasmussen <hidden> · 2014-09-17
Re: [PATCH v5 11/12] sched: replace capacity_factor by utilization · Peter Zijlstra <peterz@infradead.org> · 2014-09-17
Re: [PATCH v5 11/12] sched: replace capacity_factor by utilization · Vincent Guittot <vincent.guittot@linaro.org> · 2014-09-15
Re: [PATCH v5 11/12] sched: replace capacity_factor by utilization · Vincent Guittot <vincent.guittot@linaro.org> · 2014-09-15
Re: [PATCH v5 11/12] sched: replace capacity_factor by utilization · Peter Zijlstra <peterz@infradead.org> · 2014-09-17
Re: [PATCH v5 11/12] sched: replace capacity_factor by utilization · Vincent Guittot <vincent.guittot@linaro.org> · 2014-09-18
Re: [PATCH v5 11/12] sched: replace capacity_factor by utilization · Dietmar Eggemann <dietmar.eggemann@arm.com> · 2014-09-16
[PATCH v5 10/12] sched: get CPU's utilization statistic · Vincent Guittot <vincent.guittot@linaro.org> · 2014-08-26
Re: [PATCH v5 10/12] sched: get CPU's utilization statistic · Peter Zijlstra <peterz@infradead.org> · 2014-09-11
Re: [PATCH v5 10/12] sched: get CPU's utilization statistic · Vincent Guittot <vincent.guittot@linaro.org> · 2014-09-11
Re: [PATCH v5 10/12] sched: get CPU's utilization statistic · Peter Zijlstra <peterz@infradead.org> · 2014-09-11
Re: [PATCH v5 10/12] sched: get CPU's utilization statistic · Nicolas Pitre <hidden> · 2014-09-11
Re: [PATCH v5 10/12] sched: get CPU's utilization statistic · Vincent Guittot <vincent.guittot@linaro.org> · 2014-09-12
Re: [PATCH v5 10/12] sched: get CPU's utilization statistic · Morten Rasmussen <hidden> · 2014-09-15
Re: [PATCH v5 10/12] sched: get CPU's utilization statistic · Vincent Guittot <vincent.guittot@linaro.org> · 2014-09-16
Re: [PATCH v5 10/12] sched: get CPU's utilization statistic · Morten Rasmussen <hidden> · 2014-09-15
[PATCH v5 09/12] sched: add usage_load_avg · Vincent Guittot <vincent.guittot@linaro.org> · 2014-08-26
Re: [PATCH v5 09/11] sched: add usage_load_avg · Vincent Guittot <vincent.guittot@linaro.org> · 2014-09-04
Re: [PATCH v5 09/11] sched: add usage_load_avg · Peter Zijlstra <peterz@infradead.org> · 2014-09-11
Re: [PATCH v5 09/12] sched: add usage_load_avg · Peter Zijlstra <peterz@infradead.org> · 2014-09-11
Re: [PATCH v5 09/12] sched: add usage_load_avg · Vincent Guittot <vincent.guittot@linaro.org> · 2014-09-11
Re: [PATCH v5 09/12] sched: add usage_load_avg · Vincent Guittot <vincent.guittot@linaro.org> · 2014-09-11
Re: [PATCH v5 09/12] sched: add usage_load_avg · Morten Rasmussen <hidden> · 2014-09-15
Re: [PATCH v5 09/12] sched: add usage_load_avg · Vincent Guittot <vincent.guittot@linaro.org> · 2014-09-15
[PATCH v5 06/12] sched: add per rq cpu_capacity_orig · Vincent Guittot <vincent.guittot@linaro.org> · 2014-08-26
Re: [PATCH v5 06/12] sched: add per rq cpu_capacity_orig · Kamalesh Babulal <hidden> · 2014-08-27
Re: [PATCH v5 06/12] sched: add per rq cpu_capacity_orig · Vincent Guittot <vincent.guittot@linaro.org> · 2014-08-28
Re: [PATCH v5 06/12] sched: add per rq cpu_capacity_orig · Peter Zijlstra <peterz@infradead.org> · 2014-09-10
Re: [PATCH v5 06/12] sched: add per rq cpu_capacity_orig · Vincent Guittot <vincent.guittot@linaro.org> · 2014-09-10
Re: [PATCH v5 06/12] sched: add per rq cpu_capacity_orig · Nicolas Pitre <hidden> · 2014-09-11
Re: [PATCH v5 06/12] sched: add per rq cpu_capacity_orig · Vincent Guittot <vincent.guittot@linaro.org> · 2014-09-15
[PATCH v5 05/12] ARM: topology: use new cpu_capacity interface · Vincent Guittot <vincent.guittot@linaro.org> · 2014-08-26
Re: [PATCH v5 05/12] ARM: topology: use new cpu_capacity interface · Nicolas Pitre <hidden> · 2014-09-11
[PATCH v5 02/12] sched: remove a wake_affine condition · Vincent Guittot <vincent.guittot@linaro.org> · 2014-08-26

From: Preeti U Murthy <hidden>
Date: 2014-09-03 12:26:37
Also in: lkml

On 09/03/2014 05:14 PM, Vincent Guittot wrote:

On 3 September 2014 11:11, Preeti U Murthy [off-list ref] wrote:

quoted

On 09/01/2014 02:15 PM, Vincent Guittot wrote:

quoted

On 30 August 2014 19:50, Preeti U Murthy [off-list ref] wrote:

quoted

Hi Vincent,

quoted

index 18db43e..60ae1ce 100644

--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c

@@ -6049,6 +6049,14 @@ static bool update_sd_pick_busiest(struct lb_env *env,
                      return true;
      }

+     /*
+      * The group capacity is reduced probably because of activity from other
+      * sched class or interrupts which use part of the available capacity
+      */
+     if ((sg->sgc->capacity_orig * 100) > (sgs->group_capacity *
+                             env->sd->imbalance_pct))

Wouldn't the check on avg_load let us know if we are packing more tasks
in this group than its capacity ? Isn't that the metric we are more
interested in?

With  this patch, we don't want to pack but we prefer to spread the
task on another CPU than the one which handles the interruption if
latter uses a significant part of the CPU capacity.

quoted

+             return true;
+
      return false;
 }

@@ -6534,13 +6542,23 @@ static int need_active_balance(struct lb_env *env)
      struct sched_domain *sd = env->sd;

      if (env->idle == CPU_NEWLY_IDLE) {
+             int src_cpu = env->src_cpu;

              /*
               * ASYM_PACKING needs to force migrate tasks from busy but
               * higher numbered CPUs in order to pack all tasks in the
               * lowest numbered CPUs.
               */
-             if ((sd->flags & SD_ASYM_PACKING) && env->src_cpu > env->dst_cpu)
+             if ((sd->flags & SD_ASYM_PACKING) && src_cpu > env->dst_cpu)
+                     return 1;
+
+             /*
+              * If the CPUs share their cache and the src_cpu's capacity is
+              * reduced because of other sched_class or IRQs, we trig an
+              * active balance to move the task
+              */
+             if ((capacity_orig_of(src_cpu) * 100) > (capacity_of(src_cpu) *
+                             sd->imbalance_pct))
                      return 1;
      }
@@ -6643,6 +6661,8 @@ static int load_balance(int this_cpu, struct rq *this_rq,

      schedstat_add(sd, lb_imbalance[idle], env.imbalance);

+     env.src_cpu = busiest->cpu;
+
      ld_moved = 0;
      if (busiest->nr_running > 1) {
              /*
@@ -6652,7 +6672,6 @@ static int load_balance(int this_cpu, struct rq *this_rq,
               * correctly treated as an imbalance.
               */
              env.flags |= LBF_ALL_PINNED;
-             env.src_cpu   = busiest->cpu;
              env.src_rq    = busiest;
              env.loop_max  = min(sysctl_sched_nr_migrate, busiest->nr_running);
@@ -7359,10 +7378,12 @@ static void nohz_idle_balance(struct rq *this_rq, enum cpu_idle_type idle)

 /*
  * Current heuristic for kicking the idle load balancer in the presence
- * of an idle cpu is the system.
+ * of an idle cpu in the system.
  *   - This rq has more than one task.
- *   - At any scheduler domain level, this cpu's scheduler group has multiple
- *     busy cpu's exceeding the group's capacity.
+ *   - This rq has at least one CFS task and the capacity of the CPU is
+ *     significantly reduced because of RT tasks or IRQs.
+ *   - At parent of LLC scheduler domain level, this cpu's scheduler group has
+ *     multiple busy cpu.
  *   - For SD_ASYM_PACKING, if the lower numbered cpu's in the scheduler
  *     domain span are idle.
  */
@@ -7372,9 +7393,10 @@ static inline int nohz_kick_needed(struct rq *rq)
      struct sched_domain *sd;
      struct sched_group_capacity *sgc;
      int nr_busy, cpu = rq->cpu;
+     bool kick = false;

      if (unlikely(rq->idle_balance))
-             return 0;
+             return false;

        /*
      * We may be recently in ticked or tickless idle mode. At the first
@@ -7388,38 +7410,45 @@ static inline int nohz_kick_needed(struct rq *rq)
       * balancing.
       */
      if (likely(!atomic_read(&nohz.nr_cpus)))
-             return 0;
+             return false;

      if (time_before(now, nohz.next_balance))
-             return 0;
+             return false;

      if (rq->nr_running >= 2)

Will this check ^^ not catch those cases which this patch is targeting?

This patch is not about how many tasks are running but if the capacity
of the CPU is reduced because of side activity like interruptions
which are only visible in the capacity value (with IRQ_TIME_ACCOUNTING
config) but not in nr_running.
Even if the capacity is reduced because of RT tasks, nothing ensures
that the RT task will be running when the tick fires

Regards,
Vincent

quoted

Regards
Preeti U Murthy

quoted

-             goto need_kick;
+             return true;

      rcu_read_lock();
      sd = rcu_dereference(per_cpu(sd_busy, cpu));
-
      if (sd) {
              sgc = sd->groups->sgc;
              nr_busy = atomic_read(&sgc->nr_busy_cpus);

-             if (nr_busy > 1)
-                     goto need_kick_unlock;
+             if (nr_busy > 1) {
+                     kick = true;
+                     goto unlock;
+             }
+
      }

-     sd = rcu_dereference(per_cpu(sd_asym, cpu));
+     sd = rcu_dereference(rq->sd);
+     if (sd) {
+             if ((rq->cfs.h_nr_running >= 1) &&
+                             ((rq->cpu_capacity * sd->imbalance_pct) <
+                             (rq->cpu_capacity_orig * 100))) {

Ok I understand your explanation above. But I was wondering if you would
need to add this check around rq->cfs.h_nr_running >= 1 in the above two
cases as well.

yes you're right for the test if (rq->nr_running >= 2).

It's not so straight forward for nr_busy_cpus which reflects how many
CPUs have not stopped their tick. The scheduler assumes that the
latter are busy with cfs tasks

quoted

I have actually raised this concern over whether we should be using
rq->nr_running or cfs_rq->nr_running while we do load balancing in reply
to your patch3. While all our load measurements are about the cfs_rq

I have just replied to your comments on patch 3. Sorry for the delay

quoted

load, we use rq->nr_running, which may include tasks from other sched
classes as well. We divide them to get average load per task during load
balancing which is wrong, isn't it? Similarly during nohz_kick_needed(),
we trigger load balancing based on rq->nr_running again.

In this patch too, even if you know that the cpu is being dominated by
tasks that do not belong to cfs class, you would be triggering a futile
load balance if there are no fair tasks to move.

This patch adds one additional condition that is based on
rq->cfs.h_nr_running so it should not trigger any futile load balance.
Then, I have also take advantage of this patch to clean up
nohz_kick_needed as proposed by Peter but the conditions are the same
than previously (except the one with rq->cfs.h_nr_running)

I can prepare another patchset that will solve the concerns that you
raised for nohz_kick_needed and in patch 3 but i would prefer not
include them in this patchset which is large enough and which subject
is a bit different.
Does it seem ok for you ?

Sure Vincent, thanks! I have in fact sent out a mail raising my concern
over rq->nr_running. If others agree on the issue to be existing, maybe
we can work on this next patchset that can clean this up in all places
necessary and not just in nohz_kick_needed().

Regards
Preeti U Murthy

Regards,
Vincent

quoted

Regards
Preeti U Murthy

`h`	back out one level
`j`	next message in thread
`k`	previous message in thread
`l`	drill in
`Esc`	close help / fold thread tree
`?`	toggle this help