Thread (81 messages) 81 messages, 15 authors, 2014-01-14

Re: [lm-sensors] [PATCHv9 02/20] thermal: introduce device tree parser

From: Eduardo Valentin <hidden>
Date: 2013-11-18 14:46:31
Also in: linux-pm, lkml

Hello Jean,

I will try to complement what Rui's already commented.

On 18-11-2013 02:04, Zhang Rui wrote:
On 五, 2013-11-15 at 09:07 +0100, Jean Delvare wrote:
quoted
Hi Eduardo,

On Tue, 12 Nov 2013 15:46:04 -0400, Eduardo Valentin wrote:
quoted
This patch introduces a device tree bindings for
describing the hardware thermal behavior and limits.
Also a parser to read and interpret the data and feed
it in the thermal framework is presented.

This patch introduces a thermal data parser for device
tree. The parsed data is used to build thermal zones
and thermal binding parameters. The output data
can then be used to deploy thermal policies.

This patch adds also documentation regarding this
API and how to define tree nodes to use
this infrastructure.

Note that, in order to be able to have control
on the sensor registration on the DT thermal zone,
it was required to allow changing the thermal zone
.get_temp callback. For this reason, this patch
also removes the 'const' modifier from the .ops
field of thermal zone devices.

Cc: Zhang Rui <rui.zhang@intel.com>
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Eduardo Valentin <redacted>
---
(...)
+static int of_thermal_get_trend(struct thermal_zone_device *tz, int trip,
+				enum thermal_trend *trend)
+{
+	struct __thermal_zone *data = tz->devdata;
+	long dev_trend;
+	int r;
+
+	if (!data->get_trend)
+		return -EINVAL;
+
+	r = data->get_trend(data->sensor_data, &dev_trend);
+	if (r)
+		return r;
+
+	/* TODO: These intervals might have some thresholds, but in core code */
+	if (dev_trend > 0)
+		*trend = THERMAL_TREND_RAISING;
+	else if (dev_trend < 0)
+		*trend = THERMAL_TREND_DROPPING;
+	else
+		*trend = THERMAL_TREND_STABLE;
+
+	return 0;
+}
I don't like the whole trend thing.

I've been writing hwmon drivers for the past decade and I've never seen
a thermal sensor device being able to report trends. And as a rule of
thumb, if the hardware can't do it then the (hardware-specific) drivers
should not report it.
In fact, I would agree with you that it is not common to see such
devices. What I've already seen is that HW provides is usually artifacts
that may help on computing such statistic data. And in that case it may
be the case that the data coming from the HW may have less noise, such
as system overhead, than that computed by the caller.

However, it is also the case that such artifacts do not work without a
proper platform dependent configuration step, such as update interval
and size of the history window.
I agree that a sensor device should not be able to report trends. But
currently, the thermal zone driver is not only a sensor driver, it also
owns the platform thermal cooling strategy.
And some platforms do have this kind of knowledge, right?
e.g. ACPI spec provides an equation for processor passive cooling (ACPI
Spec 5.0 11.1.5).
Just a complementary point here, The design of current thermal framework
is so that get_trend is zone specific, and thus, it has to consider the
cooling strategy owned by the driver, like the update interval. And
besides, in case the driver does not know how to compute the trend, the
framework, the caller of that callback, also provides default behavior.

The purpose of this series is to split the data that is platform
dependent and allow designers to write those data points in device tree.
That is why the "glue layer" still keeps the get_trend.

quoted
Hwmon devices (and drivers) report temperature values, and sometimes
historical min/max. They can do that because these are facts that need
no interpretation.

Defining a trend, however, requires care and, more importantly,
decisions.

Maybe your concern is the fact that it would be very hard to provide a
trend from a hwmon driver? And that I agree, unless the device provides
artifacts. But that is not the case for all users of this API. For
instance, some versions of TI SoC bandgap sensors may have historical
buffers and cumulative values. But obviously, these features would work
only if proper configuration is agreed.
We had the assumption that we will get an interrupt when the firmware
thinks there is an temperature change that should be noticed by OS.
quoted
 For example, consider a thermal sensor which reports 50°C at
time t, then 47°C at time t+3s,
does firmware thinks this should be noticed by OS?
If no, there is no interrupt and this temperature change is totally
transparent to the OS at all.
If yes, the thermal core will check if the system is overheating, if
yes, it throttles the devices, and if no, do nothing.
quoted
 then 48°C at time t+6s. At t+7s someone
asks for the trend,
the trend is needed when platform thermal driver calls
thermal_zone_device_update(), and mostly following a temperature change
interrupt.
quoted
 what should the driver reply?
 If you consider only
the last 5 seconds, temperature is raising. If you consider the last 10
seconds instead, temperature is dropping. How would the driver know
what time frame the caller is interested in?

Another example: "small" temperature variations. If temperatures varies
by 1°C in my kitchen's oven, I'd call it stable. If my body temperature
varies by 1°C, I'd call it non-stable, and my doctor for an appointment
also ;-)
quoted
My point is that only the caller, and not the driver, knows how to
convert a series of measurements into a trend. So I don't think drivers
Agreed here. However, the caller can also instruct the device to help on
the computation based on historical values. And historical values can be
configured also by the caller.
quoted
should implement anything like get_trend(). Whatever piece of code
needs to establish a trend should call get_temp() repeatedly, store the
results and do its own analysis of the data.
I think the key problem is that,
the thermal subsystem just provides a basic/minimum thermal control
framework in kernel, and it is totally platform independent. And any
platform driver can use this framework to do some basic thermal control
easily but just telling thermal core if the thermal core should take
some cooling actions (trip points + trend + current temperature) and how
(by choosing thermal governors).

So if you want to do more precise thermal control, you'd better disable
kernel thermal control and do it in userspace. In this case,
the .get_trend() will never be invoked at all.
Further more, I'm indeed considering introduce platform specific thermal
governors, so that platform thermal driver can tell thermal core what to
do at a certain point.

thanks,
rui


-- 
You have got to be excited about what you are doing. (L. Lamport)

Eduardo Valentin

Attachments

Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help