RE: [PATCH 2/6] thermal: of: Export non-devres helper to register/unregister thermal zone
From: Biju Das <biju.das.jz@bp.renesas.com>
Date: 2025-01-30 10:33:28
Also in:
linux-clk, linux-devicetree, linux-pm, linux-renesas-soc, lkml
Hi Daniel Lezcano,
-----Original Message----- From: Daniel Lezcano <redacted> Sent: 30 January 2025 10:07 Subject: Re: [PATCH 2/6] thermal: of: Export non-devres helper to register/unregister thermal zone On Thu, Jan 30, 2025 at 11:08:03AM +0200, Claudiu Beznea wrote:quoted
Hi, Daniel, On 29.01.2025 19:24, Daniel Lezcano wrote:quoted
Hi Claudiu, On Fri, Jan 03, 2025 at 06:38:01PM +0200, Claudiu wrote:quoted
From: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> On the Renesas RZ/G3S (and other Renesas SoCs, e.g., RZ/G2{L, LC, UL}), clocks are managed through PM domains. These PM domains, registered on behalf of the clock controller driver, are configured with GENPD_FLAG_PM_CLK. In most of the Renesas drivers used by RZ SoCs, the clocks are enabled/disabled using runtime PM APIs. During probe, devices are attached to the PM domain controlling their clocks. Similarly, during removal, devices are detached from the PM domain. The detachment call stack is as follows: device_driver_detach() -> device_release_driver_internal() -> __device_release_driver() -> device_remove() -> platform_remove() -> dev_pm_domain_detach() In the upcoming Renesas RZ/G3S thermal driver, the struct thermal_zone_device_ops::change_mode API is implemented to start/stop the thermal sensor unit. Register settings are updated within the change_mode API. In case devres helpers are used for thermal zone register/unregister the struct thermal_zone_device_ops::change_mode API is invoked when the driver is unbound. The identified call stack is as follows: device_driver_detach() -> device_release_driver_internal() -> device_unbind_cleanup() -> devres_release_all() -> devm_thermal_of_zone_release() -> thermal_zone_device_disable() -> thermal_zone_device_set_mode() -> rzg3s_thermal_change_mode() The device_unbind_cleanup() function is called after the thermal device is detached from the PM domain (via dev_pm_domain_detach()). The rzg3s_thermal_change_mode() implementation calls pm_runtime_resume_and_get()/pm_runtime_put_autosuspend() before/after accessing the registers. However, during the unbind scenario, the devm_thermal_of_zone_release() is invoked after dev_pm_domain_detach(). Consequently, the clocks are not enabled, as the device is removed from the PM domain at this time, leading to an Asynchronous SError Interrupt. The system cannot be used after this.I've been through the driver before responding to this change. What is the benefit of powering down / up (or clock off / on) the thermal sensor when reading the temperature ? I can understand for disable / enable but I don't get for the classic usage where a governor will be reading the temperature regularly.I tried to be as power saving as possible both at runtime and after the IP is not used anymore as the HW manual doesn't mentioned anything about accuracy or implications of disabling the IP clock at runtime. We use similar approach (of disabling clocks at runtime) for other IPs in the RZ/G3S SoC as well.quoted
Would the IP need some cycles to capture the temperature accurately after the clock is enabled ?There is nothing about this mentioned about this in the HW manual of the RZ/G3S SoC. The only points mentioned are as described in the driver code: - wait at least 3us after each IIO channel read - wait at least 30us after enabling the sensor - wait at least 50us after setting OE bit in TSU_SM For this I chose to have it implemented as proposed.IMO, disabling/enabling the clock between two reads through the pm runtime may not be a good thing, especially if the system enters a thermal situation where it has to mitigate.
Just a question, You mean to avoid device destruction due to high temperature?? Assuming disabling the clk happens when the temp reaches the boundary and re-enabling of the clk after a time(which involves monitoring the CLK ON bit after enabling it, or a run time enable failure happens), where it exceeds the threshold?? Cheers, Biju