Re: [PATCH v2 1/2] sched/fair: Take thermal pressure into account while estimating energy
From: Lukasz Luba <lukasz.luba@arm.com>
Date: 2021-06-10 12:30:08
Also in:
lkml
On 6/10/21 1:19 PM, Vincent Guittot wrote:
On Thu, 10 Jun 2021 at 12:37, Lukasz Luba [off-list ref] wrote:quoted
On 6/10/21 11:07 AM, Dietmar Eggemann wrote:quoted
On 10/06/2021 11:04, Lukasz Luba wrote:quoted
[snip]quoted
quoted
Not always, it depends on thermal governor decision, workload and 'power actors' (in IPA naming convention). Then it depends when and how hard you clamp the CPUs. They (CPUs) don't have to be always overutilized, they might be even 50-70% utilized but the GPU reduced power budget by 2 Watts, so CPUs left with only 1W. Which is still OK for the CPUs, since they are only 'feeding' the GPU with new 'jobs'.All this pretty much confines the usefulness of you proposed change. A precise description of it with the patches is necessary to allow people to start from there while exploring your patches.OK, I see your point. [snip]quoted
quoted
True, I hope this description above would help to understand the scenario.This description belongs in the patch header. The scenario in which your functionality would improve things has to be clear. I'm sure that not everybody looking at this patches is immediately aware on how IPA setups work and which specific setup you have in mind here.Agree. I will add this description into the patch header for v3. [snip]quoted
quoted
Yes, this code implementation tries to address those issues.The point I was making here is: why using the PELT signal thermal_load_avg() and not per_cpu(thermal_pressure, cpu) directly, given the fact that the latter perfectly represents the frequency clamping?Good question. I wanted to be aligned with other parts in the fair.c like cpu_capacity() and all it's users. The CPU capacity is reduced by RT, DL, IRQ and thermal load avg, not the 'raw' value from the per-cpu variable. TBH I cannot recall what was the argument back then when thermal pressure geometric series was introduced. Maybe to have a better control how fast it raises and decays so other mechanisms in the scheduler will see the change in thermal as not so sharp... (?) Vincent do you remember the motivation to have geometric series in thermal pressure and not use just the 'raw' value from per-cpu?In order to have thermal pressure synced with other metrics used by the scheduler like util/rt/dl_avg. As an example, when thermal pressure will decrease because thermal capping is removed, the utilization will increase at the same pace as thermal will decrease and it will not create some fake spare cycle. util_avg is the average expected utilization of the cpu, thermal pressure reflects the average stolen capacity for the same averaging time scale but this can be the result of a toggle between several OPP Using current capping is quite volatile to make a decision as it might have changed by the time you apply your decision.
So for this scenario, where we want to just align EAS with SchedUtil frequency decision, which is instantaneous and has 'raw' value of capping from policy->max, shouldn't we use: thermal_pressure = arch_scale_thermal_pressure(cpu_id) instead of geometric series thermal_pressure signal? This would avoid the hassling with idle CPUs and not updated thermal signal.