Re: [PATCH 6/6] cpufreq: schedutil: New governor based on scheduler utilization data
From: Peter Zijlstra <peterz@infradead.org>
Date: 2016-03-03 16:37:42
Also in:
linux-acpi, lkml
On Thu, Mar 03, 2016 at 05:24:32PM +0100, Rafael J. Wysocki wrote:
On Thu, Mar 3, 2016 at 1:20 PM, Peter Zijlstra [off-list ref] wrote:quoted
On Wed, Mar 02, 2016 at 11:49:48PM +0100, Rafael J. Wysocki wrote:quoted
quoted
quoted
quoted
+ min_f = sg_policy->policy->cpuinfo.min_freq; + max_f = sg_policy->policy->cpuinfo.max_freq; + next_f = util > max ? max_f : min_f + util * (max_f - min_f) / max;quoted
In case a more formal derivation of this formula is needed, it is based on the following 3 assumptions: (1) Performance is a linear function of frequency. (2) Required performance is a linear function of the utilization ratio x = util/max as provided by the scheduler (0 <= x <= 1).quoted
(3) The minimum possible frequency (min_freq) corresponds to x = 0 and the maximum possible frequency (max_freq) corresponds to x = 1. (1) and (2) combined imply that f = a * x + b (f - frequency, a, b - constants to be determined) and then (3) quite trivially leads to b = min_freq and a = max_freq - min_freq.3 is the problem, that just doesn't make sense and is probably the reason why you see very little selection of the min freq.It is about mapping the entire [0,1] interval to the available frequency range.
Yeah, but I don't see why that makes sense..
I till overprovision things (the smaller x the more), but then it may help the race-to-idle a bit in theory.
So, since we also have the cpuidle information, could we not make a better guess at race-to-idle?
quoted
Suppose a machine with the following frequencies: 500, 750, 1000 And a utilization of 0.4, how does asking for 500 + 0.4 * (1000-500) = 700 make any sense? Per your point 1, it should should be asking for 0.4 * 1000 = 400. Because, per 1, at 500 it runs exactly half as fast as at 1000, and we only need 0.4 times as much. Therefore 500 is more than sufficient.OK, but then I don't see why this reasoning only applies to the lower bound of the frequency range. Is there any reason why x = 1 should be the only point mapping to max_freq?
Well, everything that goes over the second to last freq would end up at the last (max) freq. Take again the 500,750,1000 example, everything that's >750 would end up at 1000 (for relation_l, >875 for _c). But given the platform's cpuidle information, maybe coupled with an avg idle est, we can compute the benefit of race-to-idle and over provision based on that, right?
If not, then I think it's reasonable to map the middle of the available frequency range to x = 0.5 and then we have b = 0 and a = (max_freq + min_freq) / 2.
So I really think that approach falls apart on the low util bits, you effectively always run above min speed, even if min is already vstly over provisioned.