[linux-pm] [PATCH 0/3] coupled cpuidle state support
From: Lorenzo Pieralisi <hidden>
Date: 2012-02-01 18:07:57
Also in:
linux-omap, linux-pm, linux-tegra, lkml
On Wed, Feb 01, 2012 at 05:30:15PM +0000, Colin Cross wrote:
On Wed, Feb 1, 2012 at 6:59 AM, Lorenzo Pieralisi [off-list ref] wrote:quoted
On Wed, Feb 01, 2012 at 12:13:26PM +0000, Vincent Guittot wrote: [...]quoted
quoted
quoted
In your patch, you put in safe state (WFI for most of platform) the cpus that become idle and these cpus are woken up each time a new cpu of the cluster becomes idle. Then, the cluster state is chosen and the cpus enter the selected C-state. On ux500, we are using another behavior for synchronizing the cpus. The cpus are prepared to enter the c-state that has been chosen by the governor and the last cpu, that enters idle, chooses the final cluster state (according to cpus' C-state). The main advantage of this solution is that you don't need to wake other cpus to enter the C-state of a cluster. This can be quite worth full when tasks mainly run on one cpu. Have you also think about such behavior when developing the coupled cpuidle driver ? It could be interesting to add such behavior.Waking up the cpus that are in the safe state is not done just to choose the target state, it's done to allow the cpus to take themselves to the target low power state. On ux500, are you saying you take the cpus directly from the safe state to a lower power state without ever going back to the active state? I once implemented Tegrayes it isBut if there is a single power rail for the entire cluster, when a CPU is "prepared" for shutdown this means that you have to save the context and clean L1, maybe for nothing since if other CPUs are up and running the CPU going idle can just enter a simple standby wfi (clock-gated but power on). With Colin's approach, context is saved and L1 cleaned only when it is almost certain the cluster is powered off (so the CPUs). It is a trade-off, I am not saying one approach is better than the other; we just have to make sure that preparing the CPU for "possible" shutdown is better than sending IPIs to take CPUs out of wfi and synchronize them (this happens if and only if CPUs enter coupled C-states). As usual this will depend on use cases (and silicon implementations :) ) It is definitely worth benchmarking them.I'm less worried about performance, and more worried about race conditions. How do you deal with the following situation: CPU0 goes to WFI, and saves its state CPU1 goes idle, and selects a deep idle state that powers down CPU0 CPU1 saves is state, and is about to trigger the power down CPU0 gets an interrupt, restores its state, and modifies state (maybe takes a spinlock during boot) CPU1 cuts the power to CPU0 On OMAP4, the race is handled in hardware. When CPU1 tries to cut the power to the blocks shared by CPU0 the hardware will ignore the request if CPU0 is not in WFI. On Tegra2, there is no hardware support and I had to handle it with a spinlock implemented in scratch registers because CPU0 is out of coherency when it starts booting and ldrex/strex don't work. I'm not convinced my implementation is correct, and I'd be curious to see any other implementations.
That's a problem you solved with coupled C-states (ie your example in the cover letter), where the primary waits for other CPUs to be reset before issuing the power down command, right ? At that point in time secondaries cannot wake up (?) and if wfi (ie power down) aborts you just take the secondaries out of reset and restart executing simultaneously, correct ? It mirrors the suspend behaviour, which is easier to deal with than completely random idle paths. It is true that this should be managed by the PM HW; if HW is not capable of managing these situations things get nasty as you highlighted. And it is also true ldrex/strex on cacheable memory might not be available in those early warm-boot stages. I came up with a locking algorithm on strongly ordered memory to deal with that, but I am still not sure it is something we really really need. I will test coupled C-state code ASAP, and come back with feedback. Thanks, Lorenzo