Thread (68 messages) 68 messages, 9 authors, 2017-05-08

Re: [RFC][PATCH v2 2/2] cpufreq: schedutil: Avoid decreasing frequency of busy CPUs

From: Patrick Bellasi <hidden>
Date: 2017-03-21 11:56:32
Also in: lkml

On 21-Mar 09:50, Vincent Guittot wrote:
On 20 March 2017 at 22:46, Rafael J. Wysocki [off-list ref] wrote:
quoted
From: Rafael J. Wysocki <redacted>

The way the schedutil governor uses the PELT metric causes it to
underestimate the CPU utilization in some cases.

That can be easily demonstrated by running kernel compilation on
a Sandy Bridge Intel processor, running turbostat in parallel with
it and looking at the values written to the MSR_IA32_PERF_CTL
register.  Namely, the expected result would be that when all CPUs
were 100% busy, all of them would be requested to run in the maximum
P-state, but observation shows that this clearly isn't the case.
The CPUs run in the maximum P-state for a while and then are
requested to run slower and go back to the maximum P-state after
a while again.  That causes the actual frequency of the processor to
visibly oscillate below the sustainable maximum in a jittery fashion
which clearly is not desirable.

To work around this issue use the observation that, from the
schedutil governor's perspective, it does not make sense to decrease
the frequency of a CPU that doesn't enter idle and avoid decreasing
the frequency of busy CPUs.
I don't fully agree with that statement.
If there are 2 runnable tasks on CPU A and scheduler migrates the
waiting task to another CPU B so CPU A is less loaded now, it makes
sense to reduce the OPP. That's even for that purpose that we have
decided to use scheduler metrics in cpufreq governor so we can adjust
OPP immediately when tasks migrate.
That being said, i probably know why you see such OPP switches in your
use case. When we migrate a task, we also migrate/remove its
utilization from CPU.
If the CPU is not overloaded, it means that runnable tasks have all
computation that they need and don't have any reason to use more when
a task migrates to another CPU. so decreasing the OPP makes sense
because the utilzation is decreasing
If the CPU is overloaded, it means that runnable tasks have to share
CPU time and probably don't have all computations that they would like
so when a task migrate, the remaining tasks on the CPU will increase
their utilization and fill space left by the task that has just
migrated. So the CPU's utilization will decrease when a task migrates
(and as a result the OPP) but then its utilization will increase with
remaining tasks running more time as well as the OPP

So you need to make the difference between this 2 cases: Is a CPU
overloaded or not. You can't really rely on the utilization to detect
that but you could take advantage of the load which take into account
the waiting time of tasks
Right, we can use "overloaded" for the time being until we push the
"overutilized" bits.

[...]
quoted
+#ifdef CONFIG_NO_HZ_COMMON
+static bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu)
+{
+       unsigned long idle_calls = tick_nohz_get_idle_calls();
+       bool ret = idle_calls == sg_cpu->saved_idle_calls;
Vincent: are you proposing something like this?

   +     if (this_rq()->rd->overload)
   +             return false;
quoted
+
+       sg_cpu->saved_idle_calls = idle_calls;
+       return ret;
+}
+#else
+static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; }
+#endif /* CONFIG_NO_HZ_COMMON */
+
+static void sugov_update_commit(struct sugov_cpu *sg_cpu,
+                               struct sugov_policy *sg_policy,
+                               u64 time, unsigned int next_freq)
 {
        struct cpufreq_policy *policy = sg_policy->policy;

+       if (sugov_cpu_is_busy(sg_cpu) && next_freq < sg_policy->next_freq)
+               next_freq = sg_policy->next_freq;
+
        if (policy->fast_switch_enabled) {
                if (sg_policy->next_freq == next_freq) {
                        trace_cpu_frequency(policy->cur, smp_processor_id());
@@ -214,7 +234,7 @@ static void sugov_update_single(struct u
                sugov_iowait_boost(sg_cpu, &util, &max);
                next_f = get_next_freq(sg_policy, util, max);
        }
-       sugov_update_commit(sg_policy, time, next_f);
+       sugov_update_commit(sg_cpu, sg_policy, time, next_f);
 }
[...]

-- 
#include <best/regards.h>

Patrick Bellasi
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help