Re: [PATCH v3 4/7] thermal/drivers/tegra: Add driver for Tegra30 thermal sensor
From: Dmitry Osipenko <digetx@gmail.com>
Date: 2021-06-15 19:32:48
Also in:
linux-pm, linux-tegra, lkml
15.06.2021 19:18, Daniel Lezcano пишет:
On 15/06/2021 15:01, Dmitry Osipenko wrote:quoted
15.06.2021 13:26, Viresh Kumar пишет:quoted
On 15-06-21, 12:03, Daniel Lezcano wrote:quoted
[Cc Viresh] On 29/05/2021 19:09, Dmitry Osipenko wrote:quoted
All NVIDIA Tegra30 SoCs have a two-channel on-chip sensor unit which monitors temperature and voltage of the SoC. Sensors control CPU frequency throttling, which is activated by hardware once preprogrammed temperature level is breached, they also send signal to Power Management controller to perform emergency shutdown on a critical overheat of the SoC die. Add driver for the Tegra30 TSENSOR module, exposing it as a thermal sensor and a cooling device.IMO it does not make sense to expose the hardware throttling mechanism as a cooling device because it is not usable anywhere from the thermal framework. Moreover, that will collide with the thermal / cpufreq framework mitigation (hardware sets the frequency but the software thinks the freq is different), right ?H/w mitigation is additional and should be transparent to the software mitigation. The software mitigation is much more flexible, but it has latency. Software also could crash and hang. Hardware mitigation doesn't have latency and it will continue to work regardless of the software state.Yes, I agree. Both solutions have their pros and cons. However, I don't think they can co-exist sanely.quoted
The CCF driver is aware about the h/w cooling status [1], hence software sees the actual frequency. [1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit?id=344d5df34f5abd468267daa98f041abf90b2f660Ah interesting, thanks for the pointer. What I'm worried about is the consistency with cpufreq. Probably cpufreq_update_limits() should be called from the interrupt handler.
IIUC, the cpufreq already should be prepared for the case where firmware may override frequency. Viresh, could you please clarify what are the possible implications of the frequency overriding?
quoted
quoted
I am not even sure what the cooling device is doing here: tegra_tsensor_set_cur_state() is not implemented and it says hardware changed it by itself. What is the benefit you are getting out of the cooling device here ?It allows userspace to check whether hardware cooling is active via the cooling_device sysfs. Otherwise we don't have ability to check whether h/w cooling is active, I think it's a useful information. It's also interesting to see the cooling_device stats, showing how many times h/w mitigation was active.Actually the stats are for software mitigation. For the driver, create a debugfs entry like what do the other drivers or a module parameter with the stats.
Okay
quoted
quoted
quoted
The hardware limiter should let know the cpufreq framework about the frequency change. https://lkml.org/lkml/2021/6/8/1792 May be post the sensor without the hw limiter for now and address that in a separate series ?I wasn't aware about existence of the thermal pressure, thank you for pointing at it. At a quick glance it should be possible to benefit from the information about the additional pressure. Seems the current thermal pressure API assumes that there is only one user of the API. So it's impossible to aggregate the pressure from different sources, like software cpufreq pressure + h/w freq pressure. Correct? If yes, then please let me know yours thoughts about the best approach of supporting the aggregation.That is a good question. IMO, first step would be to call cpufreq_update_limits().
Right
[ Cc Thara who implemented the thermal pressure ] May be Thara has an idea about how to aggregate both? There is another series floating around with hardware limiter [1] and the same problematic. [1] https://lkml.org/lkml/2021/6/8/1791
Thanks, it indeed looks similar. I guess the common thermal pressure update code could be moved out into a new special cpufreq thermal QoS handler (policy->thermal_constraints), where handler will select the frequency constraint and set up the pressure accordingly. So there won't be any races in the code.