[PATCH v8 07/26] PM / Domains: Add genpd governor for CPUs
From: rafael@kernel.org (Rafael J. Wysocki)
Date: 2018-09-14 11:34:30
Also in:
linux-arm-msm, linux-pm, lkml
On Fri, Sep 14, 2018 at 12:44 PM Lorenzo Pieralisi [off-list ref] wrote:
On Fri, Sep 14, 2018 at 11:50:15AM +0200, Rafael J. Wysocki wrote:quoted
On Thursday, August 9, 2018 5:39:25 PM CEST Lorenzo Pieralisi wrote:quoted
On Mon, Aug 06, 2018 at 11:20:59AM +0200, Rafael J. Wysocki wrote: [...]quoted
quoted
quoted
quoted
quoted
@@ -245,6 +248,56 @@ static bool always_on_power_down_ok(struct dev_pm_domain *domain) return false; } +static bool cpu_power_down_ok(struct dev_pm_domain *pd) +{ + struct generic_pm_domain *genpd = pd_to_genpd(pd); + ktime_t domain_wakeup, cpu_wakeup; + s64 idle_duration_ns; + int cpu, i; + + if (!(genpd->flags & GENPD_FLAG_CPU_DOMAIN)) + return true; + + /* + * Find the next wakeup for any of the online CPUs within the PM domain + * and its subdomains. Note, we only need the genpd->cpus, as it already + * contains a mask of all CPUs from subdomains. + */ + domain_wakeup = ktime_set(KTIME_SEC_MAX, 0); + for_each_cpu_and(cpu, genpd->cpus, cpu_online_mask) { + cpu_wakeup = tick_nohz_get_next_wakeup(cpu); + if (ktime_before(cpu_wakeup, domain_wakeup)) + domain_wakeup = cpu_wakeup; + }Here's a concern I have missed before. :-/ Say, one of the CPUs you're walking here is woken up in the meantime.Yes, that can happen - when we miss-predicted "next wakeup".quoted
I don't think it is valid to evaluate tick_nohz_get_next_wakeup() for it then to update domain_wakeup. We really should just avoid the domain power off in that case at all IMO.Correct. However, we also want to avoid locking contentions in the idle path, which is what this boils done to.This already is done under genpd_lock() AFAICS, so I'm not quite sure what exactly you mean. Besides, this is not just about increased latency, which is a concern by itself but maybe not so much in all environments, but also about possibility of missing a CPU wakeup, which is a major issue. If one of the CPUs sharing the domain with the current one is woken up during cpu_power_down_ok() and the wakeup is an edge-triggered interrupt and the domain is turned off regardless, the wakeup may be missed entirely if I'm not mistaken. It looks like there needs to be a way for the hardware to prevent a domain poweroff when there's a pending interrupt or I don't quite see how this can be handled correctly.quoted
quoted
Sure enough, if the domain power off is already started and one of the CPUs in the domain is woken up then, too bad, it will suffer the latency (but in that case the hardware should be able to help somewhat), but otherwise CPU wakeup should prevent domain power off from being carried out.The CPU is not prevented from waking up, as we rely on the FW to deal with that. Even if the above computation turns out to wrongly suggest that the cluster can be powered off, the FW shall together with the genpd backend driver prevent it.Fine, but then the solution depends on specific FW/HW behavior, so I'm not sure how generic it really is. At least, that expectation should be clearly documented somewhere, preferably in code comments.quoted
To cover this case for PSCI, we also use a per cpu variable for the CPU's power off state, as can be seen later in the series.Oh great, but the generic part should be independent on the underlying implementation of the driver. If it isn't, then it also is not generic.quoted
Hope this clarifies your concern, else tell and will to elaborate a bit more.Not really. There also is one more problem and that is the interaction between this code and the idle governor. Namely, the idle governor may select a shallower state for some reason, for example due to an additional latency limit derived from CPU utilization (like in the menu governor), and how does the code in cpu_power_down_ok() know what state has been selected and how does it honor the selection made by the idle governor?That's a good question and it maybe gives a path towards a solution. AFAICS the genPD governor only selects the idle state parameter that determines the idle state at, say, GenPD cpumask level it does not touch the CPUidle decision, that works on a subset of idle states (at cpu level).I've deferred responding to this as I wasn't quite sure if I followed you at that time, but I'm afraid I'm still not following you now. :-) The idle governor has to take the total worst-case wakeup latency into account. Not just from the logical CPU itself, but also from whatever state the SoC may end up in as a result of this particular logical CPU going idle, this way or another. So for example, if your logical CPU has an idle state A that may trigger an idle state X at the cluster level (if the other logical CPUs happen to be in the right states and so on), then the worst-case exit latency for that is the one of state X.I will provide an example: IDLE STATE A (affects CPU {0,1}): exit latency 1ms, min-residency 1.5ms CPU 0 is about to enter IDLE state A since its "next-event" fulfill the residency requirements and exit latency constraints. CPU 1 is in idle state A (given that CPU 0 is ON, some of the common logic shared between CPU {0,1} is still ON, but, as soon as CPU 0 enters idle state A CPU {0,1} can enter the "full" idle state A power savings mode). The current CPUidle governor does not check the "next-event" for CPU 1, that it may wake up in, say, 10us.
Right.
Requesting IDLE STATE A is a waste of power (if firmware or hardware does not demote it since it does peek at CPU 1 next-event and actually demote CPU 0 request).
OK, I see. That's because the state is "collaborative" so to speak. But was't that supposed to be covered by the "coupled" thing?
The current flat list of idle states has no notion of CPUs sharing an idle state request and that's where I think this series kicks in and that's the reason I say that the genPD governor can only demote an idle state request. Linking power domains to idle states is the only sensible way I see to define what logical cpus are affected by an idle state entry, this information is missing in the current kernel (whether that's wortwhile adding it that's another question).
OK, thanks for the clarification! Cheers, Rafael