Thread (29 messages) 29 messages, 8 authors, 2012-03-14

[linux-pm] [PATCH 0/3] coupled cpuidle state support

From: Lorenzo Pieralisi <hidden>
Date: 2012-02-01 18:07:57
Also in: linux-omap, linux-pm, linux-tegra, lkml

On Wed, Feb 01, 2012 at 05:30:15PM +0000, Colin Cross wrote:
On Wed, Feb 1, 2012 at 6:59 AM, Lorenzo Pieralisi
[off-list ref] wrote:
quoted
On Wed, Feb 01, 2012 at 12:13:26PM +0000, Vincent Guittot wrote:

[...]
quoted
quoted
quoted
In your patch, you put in safe state (WFI for most of platform) the
cpus that become idle and these cpus are woken up each time a new cpu
of the cluster becomes idle. Then, the cluster state is chosen and the
cpus enter the selected C-state. On ux500, we are using another
behavior for synchronizing  the cpus. The cpus are prepared to enter
the c-state that has been chosen by the governor and the last cpu,
that enters idle, chooses the final cluster state (according to cpus'
C-state). The main advantage of this solution is that you don't need
to wake other cpus to enter the C-state of a cluster. This can be
quite worth full when tasks mainly run on one cpu. Have you also think
about such behavior when developing the coupled cpuidle driver ? It
could be interesting to add such behavior.
Waking up the cpus that are in the safe state is not done just to
choose the target state, it's done to allow the cpus to take
themselves to the target low power state.  On ux500, are you saying
you take the cpus directly from the safe state to a lower power state
without ever going back to the active state?  I once implemented Tegra
yes it is
But if there is a single power rail for the entire cluster, when a CPU
is "prepared" for shutdown this means that you have to save the context and
clean L1, maybe for nothing since if other CPUs are up and running the
CPU going idle can just enter a simple standby wfi (clock-gated but power on).

With Colin's approach, context is saved and L1 cleaned only when it is
almost certain the cluster is powered off (so the CPUs).

It is a trade-off, I am not saying one approach is better than the
other; we just have to make sure that preparing the CPU for "possible" shutdown
is better than sending IPIs to take CPUs out of wfi and synchronize
them (this happens if and only if CPUs enter coupled C-states).

As usual this will depend on use cases (and silicon implementations :) )

It is definitely worth benchmarking them.
I'm less worried about performance, and more worried about race
conditions.  How do you deal with the following situation:
CPU0 goes to WFI, and saves its state
CPU1 goes idle, and selects a deep idle state that powers down CPU0
CPU1 saves is state, and is about to trigger the power down
CPU0 gets an interrupt, restores its state, and modifies state (maybe
takes a spinlock during boot)
CPU1 cuts the power to CPU0

On OMAP4, the race is handled in hardware.  When CPU1 tries to cut the
power to the blocks shared by CPU0 the hardware will ignore the
request if CPU0 is not in WFI.  On Tegra2, there is no hardware
support and I had to handle it with a spinlock implemented in scratch
registers because CPU0 is out of coherency when it starts booting and
ldrex/strex don't work.  I'm not convinced my implementation is
correct, and I'd be curious to see any other implementations.
That's a problem you solved with coupled C-states (ie your example in
the cover letter), where the primary waits for other CPUs to be reset
before issuing the power down command, right ? At that point in time 
secondaries cannot wake up (?) and if wfi (ie power down) aborts you just
take the secondaries out of reset and restart executing simultaneously,
correct ? It mirrors the suspend behaviour, which is easier to deal with
than completely random idle paths.

It is true that this should be managed by the PM HW; if HW is not
capable of managing these situations things get nasty as you highlighted.

And it is also true ldrex/strex on cacheable memory might not be available in
those early warm-boot stages. I came up with a locking algorithm on
strongly ordered memory to deal with that, but I am still not sure it is
something we really really need.

I will test coupled C-state code ASAP, and come back with feedback.

Thanks,
Lorenzo
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help