Thread (61 messages) 61 messages, 6 authors, 2018-08-07

Re: [PATCH v5 05/12] PM / devfreq: Add support for policy notifiers

From: Chanwoo Choi <cw00.choi@samsung.com>
Date: 2018-08-06 22:16:15
Also in: linux-pm, lkml

Hi Matthias,

On 2018년 08월 07일 03:46, Matthias Kaehlcke wrote:
Hi Chanwoo,

On Fri, Aug 03, 2018 at 08:56:57AM +0900, Chanwoo Choi wrote:
quoted
Hi Matthias,

On 2018년 08월 03일 08:13, Matthias Kaehlcke wrote:
quoted
Hi Chanwoo,

On Thu, Aug 02, 2018 at 10:58:59AM +0900, Chanwoo Choi wrote:
quoted
Hi Matthias,

On 2018년 08월 02일 02:08, Matthias Kaehlcke wrote:
quoted
Hi Chanwoo,

On Wed, Aug 01, 2018 at 10:22:16AM +0900, Chanwoo Choi wrote:
quoted
On 2018년 08월 01일 04:39, Matthias Kaehlcke wrote:
quoted
On Mon, Jul 16, 2018 at 10:50:50AM -0700, Matthias Kaehlcke wrote:
quoted
On Thu, Jul 12, 2018 at 05:44:33PM +0900, Chanwoo Choi wrote:
quoted
Hi Matthias,

On 2018년 07월 07일 02:53, Matthias Kaehlcke wrote:
quoted
Hi Chanwoo,

On Wed, Jul 04, 2018 at 03:41:46PM +0900, Chanwoo Choi wrote:
quoted
Firstly,
I'm not sure why devfreq needs the devfreq_verify_within_limits() function.

devfreq already used the OPP interface as default. It means that
the outside of 'drivers/devfreq' can disable/enable the frequency
such as drivers/thermal/devfreq_cooling.c. Also, when some device
drivers disable/enable the specific frequency, the devfreq core
consider them.

So, devfreq doesn't need to devfreq_verify_within_limits() because
already support some interface to change the minimum/maximum frequency
of devfreq device. 

In case of cpufreq subsystem, cpufreq only provides 'cpufreq_verify_with_limits()'
to change the minimum/maximum frequency of cpu. some device driver cannot
change the minimum/maximum frequency through OPP interface.

But, in case of devfreq subsystem, as I explained already, devfreq support
the OPP interface as default way. devfreq subsystem doesn't need to add
other way to change the minimum/maximum frequency.
Using the OPP interface exclusively works as long as a
enabling/disabling of OPPs is limited to a single driver
(drivers/thermal/devfreq_cooling.c). When multiple drivers are
involved you need a way to resolve conflicts, that's the purpose of
devfreq_verify_within_limits(). Please let me know if there are
existing mechanisms for conflict resolution that I overlooked.

Possibly drivers/thermal/devfreq_cooling.c could be migrated to use
devfreq_verify_within_limits() instead of the OPP interface if
desired, however this seems beyond the scope of this series.
Actually, if we uses this approach, it doesn't support the multiple drivers too.
If non throttler drivers uses devfreq_verify_within_limits(), the conflict
happen.
As long as drivers limit the max freq there is no conflict, the lowest
max freq wins. I expect this to be the usual case, apparently it
worked for cpufreq for 10+ years.

However it is correct that there would be a conflict if a driver
requests a min freq that is higher than the max freq requested by
another. In this case devfreq_verify_within_limits() resolves the
conflict by raising p->max to the min freq. Not sure if this is
something that would ever occur in practice though.

If we are really concerned about this case it would also be an option
to limit the adjustment to the max frequency.
quoted
To resolve the conflict for multiple device driver, maybe OPP interface
have to support 'usage_count' such as clk_enable/disable().
This would require supporting negative usage count values, since a OPP
should not be enabled if e.g. thermal enables it but the throttler
disabled it or viceversa.

Theoretically there could also be conflicts, like one driver disabling
the higher OPPs and another the lower ones, with the outcome of all
OPPs being disabled, which would be a more drastic conflict resolution
than that of devfreq_verify_within_limits().

Viresh, what do you think about an OPP usage count?
Ping, can we try to reach a conclusion on this or at least keep the
discussion going?

Not that it matters, but my preferred solution continues to be
devfreq_verify_within_limits(). It solves conflicts in some way (which
could be adjusted if needed) and has proven to work in practice for
10+ years in a very similar sub-system.
It is not true. Current cpufreq subsystem doesn't support external OPP
control to enable/disable the OPP entry. If some device driver
controls the OPP entry of cpufreq driver with opp_disable/enable(),
the operation is not working. Because cpufreq considers the limit
through 'cpufreq_verify_with_limits()' only.
Ok, we can probably agree that using cpufreq_verify_with_limits()
exclusively seems to have worked well for cpufreq, and that in their
overall purpose cpufreq and devfreq are similar subsystems.

The current throttler series with devfreq_verify_within_limits() takes
the enabled OPPs into account, the lowest and highest OPP are used as
a starting point for the frequency adjustment and (in theory) the
frequency range should only be narrowed by
devfreq_verify_within_limits().
quoted
As I already commented[1], there is different between cpufreq and devfreq.
[1] https://lkml.org/lkml/2018/7/4/80

Already, subsystem already used OPP interface in order to control
specific OPP entry. I don't want to provide two outside method
to control the frequency of devfreq driver. It might make the confusion.
I understand your point, it would indeed be preferable to have a
single method. However I'm not convinced that the OPP interface is
a suitable solution, as I exposed earlier in this thread (quoted
below).

I would like you to at least consider the possibility of changing
drivers/thermal/devfreq_cooling.c to devfreq_verify_within_limits().
Besides that it's not what is currently used, do you see any technical
concerns that would make devfreq_verify_within_limits() an unsuitable
or inferior solution?
As we already discussed, devfreq_verify_within_limits() doesn't support
the multiple outside controllers (e.g., devfreq-cooling.c).
That's incorrect, its purpose is precisely that.

Are you suggesting that cpufreq with its use of
cpufreq_verify_within_limits() (the inspiration for
devfreq_verify_within_limits()) is broken? It is used by cpu_cooling.c
and other drivers when receiving a CPUFREQ_ADJUST event, essentially
what I am proposing with DEVFREQ_ADJUST.

Could you elaborate why this model wouldn't work for devfreq? "OPP
I don't mention that this model is not working. As I already commented[1],
devfreq used OPP interface to control OPP entry on outside of devfreq driver.
Because devfreq used OPP interface, I hope to provide only OPP method
to control the frequency on outside of devfreq.
[1] https://lkml.org/lkml/2018/7/4/80
quoted
interface is mandatory for devfreq" isn't really a technical argument,
is it mandatory for any other reason than that it is the interface
that is currently used?
In case of controlling the frequency, OPP interface is mandatory for devfreq.

cpufreq used cpufreq_verify_within_limit(). If outside driver disable
specific OPP entry, cpufreq don't consider them because after getting the frequency
from devicetree, cpufreq don't use the OPP interface for disabling/enabling.
Only if outside driver used cpufreq_verify_within_limit(), cpufreq consider
the range of minimum/maximum frequency. cpufreq core doesn't use 'dev_pm_opp_find_*'
function. It means that cpufreq code doesn't consider the statue of opp_diable/enable.

devfreq used OPP interface.  If outside driver disable specific OPP entry, devfreq consider them. 
What exactly is this 'outside driver' you are referring? The driver
that 'owns' the devfreq device, e.g. a GPU driver? Or just any
non-devfreq driver, like devfreq-cooling.c?

If it's the first case then this isn't currently working as intended
when the devfreq device is used as a cooling device, since the cooling
device would overwrite the state set by the 'owner' in
partition_enable_opps().
quoted
When find available minimum frequency, devfreq used OPP interface. (find_available_min_freq)
When find available maximum frequency, devfreq used OPP interface. (find_available_max_freq)
When make freq_table of devfreq device, devfreq used OPP interface. (set_freq_table)
When outside driver disable or enable OPP entry, devfreq receives the notification
 from OPP interface and then update the scaling_min_freq/scaling_max_freq by using
 OPP interface. (devfreq_notifier_call)
At this point of using scaling_min_freq/scaling_max_freq on devfreq, it indicates
that devfreq used OPP interface because devfref tried to find scaling_min_freq/scaling_max_freq
through OPP interface.

If outside driver use OPP interface in order to control frequency,
devfreq core is well working without any modification of devfreq
core.
Thanks for elaborating!

I understand that this is how it currently works, but unless I'm
missing something about the outside driver disabling an OPP I still
essentially read this as 'the OPP interface is mandatory because it's
what is currently used by the devfreq core to limit the frequency
range', rather than that using the OPP interface allows to provide a
particular feature or is inherently better in some other way.

I don't propose to completely strip the OPP interface out of devfreq,
but mainly to switch devfreq-cooling.c to
devfreq_verify_within_limits() to avoid having two mechanisms for
limiting the frequency range. Besides being simpler this would allow
to support the case where the 'owner' disables a certain OPP and
devfreq respects that. The code required in the devfreq core to
support this would be minimal (this patch).
quoted
quoted
quoted
After you are suggesting the throttler core, there are at least two
outside controllers (e.g., devfreq-cooling.c and throttler driver).
As I knew the problem about conflict, I cannot agree the temporary
method. OPP interface is mandatory for devfreq in order to control
the OPP (frequency/voltage). In this situation, we have to try to
find the method through OPP interface.
What do you mean with "temporary method"?
this expression might be not proper. Please ignore this expression.
quoted
We can try to find a method through the OPP interface, but at this
point I'm not convinced that it is technically necessary or even
preferable.
I replied it about this as following.
quoted
Another inconvenient of the OPP approach for both devfreq-cooling.c
and the throttler is that they have to bother with disabling all OPPs
above/below the max/min (they don't/shouldn't have to care), instead
of just telling devfreq the max/min.
I think it doesn't matter. We can enable/disable the OPP entry by traversing.
partition_enable_opps() in drivers/thermal/devfreq-cools.c have already done so.
quoted
quoted
We can refer to regulator/clock. Multiple device driver can use
the regulator/clock without any problem. I think that usage of OPP
is similiar with regulator/clock. As you mentioned, maybe OPP
would handle the negative count. Although opp_enable/opp_disable()
have to handle the negative count and opp_enable/opp_disable()
can support the multiple usage from device drivers, I think that
this approach is right.
The regulator/clock approach with the typical usage counts seems more
intuitive to me, personally I wouldn't write an interface with
negative usage count if I could reasonably avoid it.
I think the use of negative usage count is not problem if it's required.
quoted
quoted
quoted
quoted
I want to use only OPP interface to enable/disable frequency
even if we have to modify the OPP interface.
These are the concerns I raised earlier about a solution with OPP
usage counts:

"This would require supporting negative usage count values, since a OPP
should not be enabled if e.g. thermal enables it but the throttler
disabled it or viceversa.
Already replied	about negative usage count. I think that negative usage count
is not problem if this approach could resolve the issue.
quoted
Theoretically there could also be conflicts, like one driver disabling
the higher OPPs and another the lower ones, with the outcome of all
OPPs being disabled, which would be a more drastic conflict resolution
than that of devfreq_verify_within_limits()."

What do you think about these points?
It depends on how to use OPP interface on multiple device driver.
Even if devfreq/opp provides the control method, outside device driver
are misusing them. It is problem of user.
I wouldn't call it misusing if two independent drivers take
contradictory actions on an interface that doesn't provide
arbitration. How can driver A know that it shouldn't disable OPPs a, b
and c because driver B disabled d, e and f? Who is misusing the
interface, driver A or driver B?
Each outside driver has their own throttling policy to control OPP entries.
They don't care the requirement of other driver and cannot know the requirement
of other driver. devfreq core can only recognize them.
quoted
quoted
Instead, if we use the OPP interface, we can check why OPP entry
is disabled or enabled through usage count.
quoted
The negative usage counts aren't necessarily a dealbreaker in a
technical sense, though I'm not a friend of quirky interfaces that
don't behave like a typical user would expect (e.g. an OPP isn't
necessarily enabled after dev_pm_opp_enable()).

I can sent an RFC with OPP usage counts, though due to the above
concerns I have doubts it will be well received.
Please add me to Cc list.
Will do
OK. Thanks.
This might take a bit for a few reasons. Before posting anything I
would like to experiment a bit with it and find time to do so between
other tasks (admittedly I'm also procrastinating a bit, because I'm
unconvinced). And I will be out of office for two weeks starting
nextweek, it's probably not the best to post and then disapear from
the discussion. I might post the RFC if I can advance it in the next
48 hours, otherwise I think it is better to delay until I'm back from
vacation.
I agree you better to do this after your vacation.


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help