Thread (155 messages) 155 messages, 12 authors, 2016-03-18

Re: [PATCH 6/6] cpufreq: schedutil: New governor based on scheduler utilization data

From: Peter Zijlstra <peterz@infradead.org>
Date: 2016-03-03 16:37:42
Also in: linux-acpi, lkml

On Thu, Mar 03, 2016 at 05:24:32PM +0100, Rafael J. Wysocki wrote:
On Thu, Mar 3, 2016 at 1:20 PM, Peter Zijlstra [off-list ref] wrote:
quoted
On Wed, Mar 02, 2016 at 11:49:48PM +0100, Rafael J. Wysocki wrote:
quoted
quoted
quoted
quoted
+       min_f = sg_policy->policy->cpuinfo.min_freq;
+       max_f = sg_policy->policy->cpuinfo.max_freq;
+       next_f = util > max ? max_f : min_f + util * (max_f - min_f) / max;
quoted
In case a more formal derivation of this formula is needed, it is
based on the following 3 assumptions:

(1) Performance is a linear function of frequency.
(2) Required performance is a linear function of the utilization ratio
x = util/max as provided by the scheduler (0 <= x <= 1).
quoted
(3) The minimum possible frequency (min_freq) corresponds to x = 0 and
the maximum possible frequency (max_freq) corresponds to x = 1.

(1) and (2) combined imply that

f = a * x + b

(f - frequency, a, b - constants to be determined) and then (3) quite
trivially leads to b = min_freq and a = max_freq - min_freq.
3 is the problem, that just doesn't make sense and is probably the
reason why you see very little selection of the min freq.
It is about mapping the entire [0,1] interval to the available frequency range.
Yeah, but I don't see why that makes sense..
I till overprovision things (the smaller x the more), but then it may
help the race-to-idle a bit in theory.
So, since we also have the cpuidle information, could we not make a
better guess at race-to-idle?
quoted
Suppose a machine with the following frequencies:

        500, 750, 1000

And a utilization of 0.4, how does asking for 500 + 0.4 * (1000-500) =
700 make any sense? Per your point 1, it should should be asking for
0.4 * 1000 = 400.

Because, per 1, at 500 it runs exactly half as fast as at 1000, and we
only need 0.4 times as much. Therefore 500 is more than sufficient.
OK, but then I don't see why this reasoning only applies to the lower
bound of the frequency range.  Is there any reason why x = 1 should be
the only point mapping to max_freq?
Well, everything that goes over the second to last freq would end up at
the last (max) freq.

Take again the 500,750,1000 example, everything that's >750 would end up
at 1000 (for relation_l, >875 for _c).

But given the platform's cpuidle information, maybe coupled with an avg
idle est, we can compute the benefit of race-to-idle and over provision
based on that, right?
If not, then I think it's reasonable to map the middle of the
available frequency range to x = 0.5 and then we have b = 0 and a =
(max_freq + min_freq) / 2.
So I really think that approach falls apart on the low util bits, you
effectively always run above min speed, even if min is already vstly
over provisioned.
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help