Thread (12 messages) 12 messages, 3 authors, 2014-02-17

[PATCH RFC v3 3/3] Documentation: arm: define DT idle states bindings

From: Lorenzo Pieralisi <hidden>
Date: 2014-02-14 11:27:56
Also in: linux-devicetree, linux-pm

On Fri, Feb 14, 2014 at 12:29:26AM +0000, Sebastian Capella wrote:
Quoting Lorenzo Pieralisi (2014-02-13 04:47:32)
quoted
Such as ? "idle-states" ?
That sounds good to me.  Our preference is for idle-states to be used
for name of the idle-states.txt node.
Ok, so s/cpu-idle-state/idle-state everywhere, inclusive of state nodes
compatible properties ("arm,cpu-idle-state" becomes "arm,idle-state") ?
quoted
quoted
During last connect, we'd discussed that the small set of
states here could apply to a single node, which can represent a cpu, a
cluster with cache etc.  Then the complexities of the system power state
would be represented using a heirarchy, with each node in the
tree having its own state from the list above.  This would allow
a fairly rich system state while having just a few power states defined
at each level.  Is this how you're intending these bindings to go?
Yes, CPUs point at states that can be reached by that CPU (eg a CPU core
gating state is represented by a single node pointed at by all CPUs in the
system - it the same state data, different power domains though).

States are hierarchical, which means that a parent state implies entry on all
substates that's how cluster states are defined.
The cpu 0 node, in its cpu-idle-state array is also pointing at a cluster
node right?  STATE0 -> power-domains refers to <pd_clusters 0>.  Is it
correct that if the cpu's workload is such that it tolerates the
500/1000/2500 entry/exit latency/min-residency, it will call the
entry-method with the psci-power-state of 0x10100000 matching that node?
Yes. As we know, since it is a cluster state, that might happen or not
depending on the state of other CPUs.
quoted
quoted
Also, would it be nice to have a name field for each state?
There is:

"The state node name shall be "stateN", where N = {0, 1, ...} is
the node number; state nodes which are siblings within a single
common parent node must be given a unique and sequential N value,
starting from 0."
I can remove the rule and allow people to call states as they wish
since they already have a compatible property to match them.

Actually, states can be called with any name, provided it is unique.
Sorry, I was referring to a descriptive name string property.  Something
that can be useful to help a human understand what's going on.  Naming
the nodes might work too.
I do not like that, but I can remove the naming scheme and let people
find a naming scheme that complies with DT rules (nodes within a parent
must have a unique name). Not sure this would make dts more readable, but
I do not think it is a problem either.
quoted
quoted
quoted
+       A state node can contain state child nodes. A state node with
+       children represents a hierarchical state, which is a superset of
+       the child states. Hierarchical states require all CPUs on which
+       they are valid to request the state in order for it to be entered.
Is it possible for a cpu to request a deeper state and unblock other cpus
from entering this state?
That's not DT bindings business, hierarchical states define states that
require all CPUs on which they are valid to enter that state for it to
be considered "enabled".

It is a hard nut to crack. In x86 this stuff does not exist and it is
managed in HW, basically an idle state is always per-cpu (but it might
end up becoming a package state when all CPUs in a package enter that
state). On ARM, we want to define hierarchical states explicitly to
link resources (ie caches) to them.

CPUs are not prevented from requesting a hierarchical state, but the
state only becomes "enabled" when all CPUs on which it is valid request
it.
The way we've seen it work, is a cpu tries to enter the lowest state it can
tolerate (including for eg. cluster off).  The system level code then looks
at the allowable states from all the cpus and selects the lowest power
state permissible for the heirarchies.

Eg. this way, the last cpu preventing a cluster from going to sleep enters
cluster sleep state (where peer cpu's have already voted for cluster
sleep) and the cluster will sleep.  Alternately if the same cpu cannot
tolerate the additional latency of the cluster sleeping, it will vote
only for cpu-sleep, then the cluster remains in a state where all cpu's are
sleeping but the cluster remains up.
That's exactly what these bindings are meant to describe too and I think they
do. There is also loads of information that can be used for tuning (what
caches are affected by an idle state, what CPUs "share" an idle-state
and so on).
quoted
I cannot think of any other way of to express this properly but still in
a compact way.
You're doing a great job by the way!  Thanks! :)
Thank you, we are almost there.
quoted
quoted
"all CPUs on which they are valid" is this determined by seeing which
state's phandle is in each cpu->cpu-idle-states?
Yes, does it make sense ?
Yup, maybe adding a little parenthetical text like below may help others
as well?

Hierarchical states require all CPUs on which they are valid (ie. CPU nodes
containing cpu-idle-state arrays having a phandle reference to the state)
to request the state in order for it to be entered.
OK, applied.
quoted
quoted
Definition: It represents an idle state index.

        Index 0 and 1 shall not be specified and are reserved for ARM
        states where index 0 is running, and index 1 is idle_standby
        entered by executing a wfi instruction (SBSA,[3])

What mechanism is used to order the power states WRT power consumption?
I think we should use index for that. The higher the index the lower the
power consumption.
Ok, I think I get it now.
If we decouple index and SBSA states, do we really need to reserve index
0 and 1 then?
What I will do: move the entry-method to top-level cpu-idle-states node
(soon to be idle-states node) and add a property there:

"arm,has-idlewfi"

which allows me to prevent people from adding an explicit state that just
executes the wfi instruction on entry.

This way we can have a *global* entry-method, and we can also detect if the
platform supports wfi in its bare form.

Yes, index can start from 0, disallowing 0 and 1 was a (odd) way to prevent
people from adding wfi to DT. If the platform supports simple idlestandby
(ie wfi) it has to add the boolean property above.

How does that sound ?
quoted
quoted
Can this field be used to combine both psci and non-psci states in any order?
No. I will enforce a unique entry method.
So a single entry-method for all  states in idle-states node?
Should this be specified once only then?
Yes, see above.
Should we not allow more flexibility here to mix methods where some
states use one method and some use others?  Is this mostly a security
concern?  What if a vendor wants to introduce a state between wfi and
cpu-sleep?
entry-method-param is the way to differentiate. One interface/entry-method
(PSCI or vendor specific), different state parameters.

We are not installing multiple back-ends to enter different idle states,
are we ? That would be awkward.

ACK ?
quoted
quoted
I assume psci will be turning off the powerdomains not Linux right?
This is not a PSCI only document, and even if it was, we still need to deal
with devices. Which means we need to know what we have to save/restore (PMU,
arch timer, GIC), and power domains help us detect that.
quoted
quoted
quoted
+
+cpus {
+       #size-cells = <0>;
+       #address-cells = <2>;
+
+       cpu-idle-states {
+
+               STATE0: state0 {
+                       compatible = "arm,cpu-idle-state";
+                       index = <3>;
Are the index fields of nested states independent of each other or
sequential?
...
I understand index is misleading and either I remove it, or I
leave it there to define power savings scale as you mentioned.
I like index, but I was confused.  You cleared it up pretty quick with
your earlier comment.  Removing the numbering may help, but I think
keeping the index is useful.
I will leave the index to sort the states according to power consumption.

Thanks !
Lorenzo
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help