Thread (26 messages) 26 messages, 7 authors, 2016-12-15

Re: [PATCH V2 09/10] thermal: da9062/61: Thermal junction temperature monitoring driver

From: Eduardo Valentin <edubezval@gmail.com>
Date: 2016-11-30 06:10:12
Also in: linux-input, linux-pm, linux-watchdog, lkml

Hey Steve,

On Tue, Nov 29, 2016 at 11:11:59AM +0000, Steve Twiss wrote:
Hi Eduardo,

Thanks for your response.

On 29 November 2016 01:24, Eduardo Valentin, wrote:
quoted
On Wed, Oct 26, 2016 at 05:56:39PM +0100, Steve Twiss wrote:
quoted
+config DA9062_THERMAL
+	tristate "DA9062/DA9061 Dialog Semiconductor thermal driver"
+	depends on MFD_DA9062
+	depends on OF
+	help
+	  Enable this for the Dialog Semiconductor thermal sensor driver.
+	  This will report PMIC junction over-temperature for one thermal trip
+	  zone.
+	  Compatible with the DA9062 and DA9061 PMICs.
Is there any hardware documentation available for this chip that can be
pointed out?
As part of this patch set, I added a link to the the datasheet into the top-level MFD
component of the device tree binding: https://patchwork.kernel.org/patch/9426791/
You can get all the information for DA9062 and DA9061 from the patch update in that
link.
Thanks for getting me up to speed.
[...]
<cut>
quoted
Does this mean that the chip temperature is above or equal to 125C, is
this really a safe temperature to keep it running?
There is more information in the datasheet under the section titles "Junction
temperature supervision". The value of 125 degC comes from the "warning
temperature threshold" and the event is triggered when "the junction temperature
rises above the first threshold (TEMP_WARN), the event E_TEMP is asserted".

This suggests strictly greater than 125. However, there is no way for the chip to
know the exact temperature. The mechanism is interrupt based and triggering
only happens when the temperature rises above the threshold level.
Understood. My original concern was if the driver would be too nice with
the event. But given that the hardware has its own shutdown protection,
looks like we are OK here.
[...]
quoted
quoted
+static irqreturn_t da9062_thermal_irq_handler(int irq, void *data)
+{
+	struct da9062_thermal *thermal = data;
+
+	disable_irq_nosync(thermal->irq);
+	schedule_delayed_work(&thermal->work, 0);
Can you please elaborate a little on why you have an one shot threaded IRQ
handler that is only disabling the IRQ then, scheduling a work with delay of 0?

To my understanding, this is exactly what you get when you run your
threaded IRQ handler, when you configure it as one shot.

Have you tried simply handling the same work done in your workqueue
handler in your threaded IRQ? That should simplify your code and get the
same result you are expecting.

If you are not getting the IRQ disabled on the threaded IRQ when
configured as one shot, something else seams to be broken.
Over-temperature triggering is event based: an interrupt happens when the
temperature rises above 125 degC. However, this event based system changes into
a polling operation to detect when the temperature falls below the threshold
level again. This asymmetry in the chip's behaviour is the reasoning behind
why I am not using the thermal core's built-in polling functionality.

When over-temperature is reached, the interrupt from the DA9061/2 will be
repeatedly triggered. The IRQ is disabled when the first over-temperature event
happens and the status bit is polled using the work-queue until it becomes false.
After that, the IRQ is re-enabled again so a new critical temperature event can
be detected.

After the interrupt has happened, event bit for the interrupt switches into a status
bit and is used for polling and detecting when the temperature drops below the
threshold.
OK. got it. Then, I am assuming your strategy here is to keep periodically issuing uevents
(HOT trip point) to userland, hoping that something would change the
system power consumption, then, relying on the hardware shutdown protection
if nothing happens.

I would say, your above explanation, and the uevent based strategy,
deserves to be at least in the commit message, preferably in the driver
documentation, so people know what to expect from the driver.

Now, if my understand is correct, would it make sense to have still a
failsafe mechanism in the driver? Maybe having a max number of polling?

The data sheet does not mention anything, but does one have any silicon
lifetime implication if one leaves the PMIC running for very long time
in a temperature between Twarn and Tcrit?
https://lkml.org/lkml/2016/10/20/372
https://lkml.org/lkml/2016/10/20/433

[...]
quoted
quoted
+	thermal->zone = thermal_zone_device_register(thermal->config->name,
+					1, 0, thermal,
+					&da9062_thermal_ops, NULL, 0,
+					0);
Did you try using of-thermal?

So you would avoid having specific DT properties for something that
there is already a defined property?
In an earlier RFC, I examined a work-around by hijacking and toggling the
thermal core's built-in polling function when I needed to poll the temperature
through get_temp(). However I thought this was quite dangerous, since it would
not be using a formal thermal core interface.

https://patchwork.kernel.org/patch/9387241/
my point was more under the lines of having this driver using the
of-thermal DT support to get the polling value, instead of you having a
manufacturer specific property:
Documentation/devicetree/bindings/thermal/thermal.txt

But given that your case is more specific, I start to see why you
avoided using it. But still, you could probably get the polling
values from of-thermal, populated in the tz struct, then using them from
the tz, when handling the IRQ event.

Probably your regular polling should always be set to 0, and the passive
polling to the value you want.

then again, this comment is more from a DT perspective than from the
driver code itself. Just trying to avoid specific properties that may
get described by what is already defined.
[...]
quoted
quoted
+	ret = request_threaded_irq(thermal->irq, NULL,
+				   da9062_thermal_irq_handler,
+				   IRQF_TRIGGER_LOW | IRQF_ONESHOT,
+				   "THERMAL", thermal);
How about using the devm_ version?
I wanted to explicitly free_irq() before calling thermal_zone_device_unregister()
in the remove function.
Ok
Regards,
Steve
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help