Thread (133 messages) 133 messages, 13 authors, 2016-03-10

Re: [PATCH v10 1/3] cpufreq: Add mechanism for registering utilization update callbacks

From: Vincent Guittot <vincent.guittot@linaro.org>
Date: 2016-02-22 14:33:29
Also in: lkml

On 22 February 2016 at 11:52, Peter Zijlstra [off-list ref] wrote:
On Fri, Feb 19, 2016 at 09:28:23AM -0800, Steve Muckle wrote:
quoted
On 02/19/2016 08:42 AM, Srinivas Pandruvada wrote:
quoted
We did experiments using util/max in intel_pstate. For some benchmarks
there were regression of 4 to 5%, for some benchmarks it performed at
par with getting utilization from the processor. Further optimization
in the algorithm is possible and still in progress. Idea is that we can
change P-State fast enough and be more reactive. Once I have good data,
I will send to this list. The algorithm can be part of the cpufreq
governor too.
There has been a lot of work in the area of scheduler-driven CPU
frequency selection by Linaro and ARM as well. It was posted most
recently a couple months ago:

http://thread.gmane.org/gmane.linux.power-management.general/69176

It was also posted as part of the energy-aware scheduling series last
July. There's a new RFC series forthcoming which I had hoped (and
failed) to post prior to my business travel this week; it should be out
next week. It will address the feedback received thus far along with
locking and other things.
Right, so I had a wee look at that again, and had a quick chat with Juri
on IRC. So the main difference seems to be that you guys want to know
why the utilization changed, as opposed to purely _that_ it changed.
Yes, the main goal was to be able to filter the useful and useless
update of rq's utilization in order to minimize/optimize the trig of
an update of the frequency. These patches have been made for a cpufreq
driver that reacts far slower than scheduler. It's might worth
starting with a simple solution and update it after
And hence you have callbacks all over the place.

I'm not too sure I really like that too much, it bloats the code and
somewhat obfuscates the point.

So I would really like there to be just the one callback when we
actually compute a new number, and that is update_load_avg().

Now I think we can 'easily' propagate the information you want into
update_load_avg() (see below), but I would like to see actual arguments
for why you would need this.
Your proposal is interesting except that we are interested in the rq's
utilization more that se's ones so we should better use
update_cfs_rq_load_avg and few additional place like
attach_entity_load_avg which bypasses update_cfs_rq_load_avg to update
rq's utilization and load
quoted hunk ↗ jump to hunk
For one, the migration bits don't really make sense. We typically do not
call migration code local on both cpus, typically just one, but possibly
neither. That means you cannot actually update the relevant CPU state
from these sites anyway.
quoted
The scheduler hooks for utilization-based cpufreq operation deserve a
lot more debate I think. They could quite possibly have different
requirements than hooks which are chosen just to guarantee periodic
callbacks into sampling-based governors.
I'll repeat what Rafael said, the periodic callback nature is a
'temporary' hack, simply because current cpufreq depends on that.

The idea is to wane cpufreq off of that requirement and then drop that
part.

Very-much-not-signed-off-by: Peter Zijlstra
---
 kernel/sched/fair.c | 29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7ce24a456322..f3e95d8b65c3 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2528,6 +2528,17 @@ static inline void update_cfs_shares(struct cfs_rq *cfs_rq)
 }
 #endif /* CONFIG_FAIR_GROUP_SCHED */

+enum load_update_type {
+       LOAD_NONE,
+       LOAD_TICK,
+       LOAD_PUT,
+       LOAD_SET,
+       LOAD_ENQUEUE,
+       LOAD_DEQUEUE,
+       LOAD_ENQUEUE_MOVE = LOAD_ENQUEUE + 2,
+       LOAD_DEQUEUE_MOVE = LOAD_DEQUEUE + 2,
+};
+
 #ifdef CONFIG_SMP
 /* Precomputed fixed inverse multiplies for multiplication by y^n */
 static const u32 runnable_avg_yN_inv[] = {
@@ -2852,7 +2863,8 @@ static inline int update_cfs_rq_load_avg(u64 now, struct cfs_rq *cfs_rq)
 }

 /* Update task and its cfs_rq load average */
-static inline void update_load_avg(struct sched_entity *se, int update_tg)
+static inline void update_load_avg(struct sched_entity *se, int update_tg,
+                                  enum load_update_type type)
 {
        struct cfs_rq *cfs_rq = cfs_rq_of(se);
        u64 now = cfs_rq_clock_task(cfs_rq);
@@ -2940,7 +2952,7 @@ enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 static inline void
 dequeue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
-       update_load_avg(se, 1);
+       update_load_avg(se, 1, LOAD_DEQUEUE);

        cfs_rq->runnable_load_avg =
                max_t(long, cfs_rq->runnable_load_avg - se->avg.load_avg, 0);
@@ -3006,7 +3018,8 @@ static int idle_balance(struct rq *this_rq);

 #else /* CONFIG_SMP */

-static inline void update_load_avg(struct sched_entity *se, int update_tg) {}
+static inline void update_load_avg(struct sched_entity *se, int update_tg,
+                                  enum load_update_type type) {}
 static inline void
 enqueue_entity_load_avg(struct cfs_rq *cfs_rq, struct sched_entity *se) {}
 static inline void
@@ -3327,7 +3340,7 @@ set_next_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
                if (schedstat_enabled())
                        update_stats_wait_end(cfs_rq, se);
                __dequeue_entity(cfs_rq, se);
-               update_load_avg(se, 1);
+               update_load_avg(se, 1, LOAD_SET);
        }

        update_stats_curr_start(cfs_rq, se);
@@ -3431,7 +3444,7 @@ static void put_prev_entity(struct cfs_rq *cfs_rq, struct sched_entity *prev)
                /* Put 'current' back into the tree. */
                __enqueue_entity(cfs_rq, prev);
                /* in !on_rq case, update occurred at dequeue */
-               update_load_avg(prev, 0);
+               update_load_avg(prev, 0, LOAD_PUT);
        }
        cfs_rq->curr = NULL;
 }
@@ -3447,7 +3460,7 @@ entity_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr, int queued)
        /*
         * Ensure that runnable average is periodically updated.
         */
-       update_load_avg(curr, 1);
+       update_load_avg(curr, 1, LOAD_TICK);
        update_cfs_shares(cfs_rq);

 #ifdef CONFIG_SCHED_HRTICK
@@ -4320,7 +4333,7 @@ enqueue_task_fair(struct rq *rq, struct task_struct *p, int flags)
                if (cfs_rq_throttled(cfs_rq))
                        break;

-               update_load_avg(se, 1);
+               update_load_avg(se, 1, LOAD_ENQUEUE + (p->on_rq & TASK_ON_RQ_MIGRATING));
                update_cfs_shares(cfs_rq);
        }
@@ -4380,7 +4393,7 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
                if (cfs_rq_throttled(cfs_rq))
                        break;

-               update_load_avg(se, 1);
+               update_load_avg(se, 1, LOAD_DEQUEUE + (p->on_rq & TASK_ON_RQ_MIGRATING));
                update_cfs_shares(cfs_rq);
        }

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help