Re: cpufreq terminally broken [was Re: community PM requirements/issues and PowerOP]
From: Matthew Locke <hidden>
Date: 2007-02-27 22:41:41
On Feb 27, 2007, at 12:55 PM, David Brownell wrote:
Catching up on some ancient mail from one mbox .. On Wednesday 13 September 2006 4:50 pm, David Singleton wrote:quoted
OpPoint constructs operating points for all supported frequency, voltage and suspend states for PC and SoC solutions running Linux.That's one basic issue I have with such approaches to desribing operating points: "all" such states gets to be an enormous set. What I've seen of both PowerOP and OpPoint says that they both try to limit that set by just enumerating a handful of specific operating points ... but the more generic solution (generally matching chip specs) would be having a way to constrain the parameters within their natural limits. (Rather than picking out a set of half a dozen system modes in advance, by hand.)
Agreed, well mostly anyway:) Eugeny and I went back to the drawing board to see what we could do based on the comments last year and specifically Dominiks "Alternative concept" email. Basically, we agree that the operating point notion is too limiting and artificial to be the basis for a power management stack. Something like the knob layer described in Dominik's email is needed. We have done a bit of thinking on the necessary behavior and features of such a layer. It's funny I was in the middle of writing up our thoughts in an email to the pm list when your email came in. I will finish up that email and then come back to your specific examples below.
- With CPU clock at AAA MHz, chip voltage K must be between A1 and A2 volts; but those other clocks only need to follow their usual rules. This defines a set of many operating points. - The MMC driver needs to have power supply P output 3.3 V at 80 mA and have clock D active. (Presumably, a different set of operating points.) - If clock D is active, the SOC chip can't enter power state X; but again, other clocks can use their normal rules. Again, many operating points. - While UART U3 is set to 115200 baud, certain values of clock M aren't allowed; but there are no other constraints, so that many operating points are compatible with that configuration. - Those chips must be in power state X for that module to enter power state Y; other modules can be in any power state. - That module must be in power state Y for the system to enter power state Z. - Because of chip errata, <these> parameter combinations (or transitions) are invalid; don't trigger them. Cataloguing every possible power-related parameter seems like a losing game, even on relatively tiny systems which pay attention to power usage from within each driver ... and doomed to failure in larger scenarios, like that 256-core case. It seems that I'm actually criticizing the notion of "operating point" as a model to expose as a power management target ... It's simple to say that the system is at a particular operating point, and that it's an operating point that works well for MP4 playback. That's like saying "it's warm today"; there are many kinds of warm day. It's purely descriptive, and omits lots of relevant details. (Rainy too?) But I really can't think it would be common for that to be the _only_ such operating point ... simple counter-examples include the MMC and UART cases above, considering that playback could often work with or without MMC active, with or without UART at 115200 baud. Ergo, multiple operating points support MP4 playback, ergo "operating point" isn't the key notion that would need to be exposed. QED. Now, where does that leave us? I think it leaves us looking at how those constraints get expressed (by e.g. device drivers for clocks and voltages, ditto cpufreq drivers) and to what they get expressed (clock framework, voltage framework, maybe a CPU horsepower manager the scheduler talks to). So for that MP4 example, one could alert the video driver to do its clock setup, a horsepower manager to say "intensive software decode load coming up for this RT task", and one of the relevant operating points would be entered. And if the video data were coming from the MMC card, a slightly different one would be entered as part of starting that data stream; etc. Or in the 256-cpu example, just alert the horsepower manager that a huge simulation job is upcoming ... that is, if the scheduler doesn't just do that automatically when it notices it'd be a nice time to bring up some of those currently-downed cores. (Automatic up/down of cores may not behave well in all cases. By analogy: madvise gives VM advice that can't be guessed by the kernel; schedulers may need similar advice.) Comments? - Dave _______________________________________________ linux-pm mailing list linux-pm@lists.osdl.org https://lists.osdl.org/mailman/listinfo/linux-pm