RE: [PATCH thermal 1/1] thermal/core: Skip cooling device statistics update for configuration operation
From: Vadim Pasternak <vadimp@nvidia.com>
Date: 2021-09-27 17:52:56
-----Original Message----- From: Daniel Lezcano <redacted> Sent: Monday, September 27, 2021 4:56 PM To: Vadim Pasternak <vadimp@nvidia.com>; rui.zhang@intel.com Cc: linux-pm@vger.kernel.org; Ido Schimmel <idosch@nvidia.com>; Rafael J. Wysocki [off-list ref] Subject: Re: [PATCH thermal 1/1] thermal/core: Skip cooling device statistics update for configuration operation On 27/09/2021 15:29, Vadim Pasternak wrote:quoted
quoted
-----Original Message----- From: Daniel Lezcano <redacted> Sent: Monday, September 27, 2021 3:32 PM To: Vadim Pasternak <vadimp@nvidia.com>; rui.zhang@intel.com Cc: linux-pm@vger.kernel.org; Ido Schimmel <idosch@nvidia.com>;Rafael J.quoted
quoted
Wysocki [off-list ref] Subject: Re: [PATCH thermal 1/1] thermal/core: Skip cooling device statistics update for configuration operation On 27/09/2021 13:22, Vadim Pasternak wrote:quoted
Hi Daniel, Thank you for quick reply.quoted
-----Original Message----- From: Daniel Lezcano <redacted> Sent: Monday, September 27, 2021 1:42 PM To: Vadim Pasternak <vadimp@nvidia.com>; rui.zhang@intel.com Cc: =idosch@nvidia.com; linux-pm@vger.kernel.org Subject: Re: [PATCH thermal 1/1] thermal/core: Skip cooling device statistics update for configuration operation Hi Vadim, On 27/09/2021 10:24, Vadim Pasternak wrote:quoted
The thermal subsystem maintains a transition table between states that is allocated according to the maximum state supported by the cooling device. When the table needs to be updated, the thermal subsystem doesnotquoted
quoted
quoted
quoted
quoted
validate that the new state does not exceed the maximum state, leading to out-of-bounds memory accesses [1].Actually, thermal_cooling_device_stats_update() is called if the set_cur_state is successful. With a state greater than the max state, the set_cur_state should fail and thermal_cooling_device_stats_update() is not called. Perhaps the problem is in mlxsw_thermal_set_cur_state() ?"mlxsw" thermal drivers has additional use of 'sysfs' 'cur_state' for configuration purpose to limit minimum fan speed. Fan speed minimum is enforced by setting 'cur_state' with value exceeding actual fan speed maximum.Yes, and that is the problem because the driver is doing weird things with the cooling device state resulting in an abuse of the sysfs API and conflicting with the thermal internals.quoted
This feature provides ability to limit fan speed according to some system wise considerations, like absence of some replaceable units or high system ambient temperature, or some other factors which indirectly impacts system airflow.Is that a static thermal profile depending on the platform set by userspace or something which can be changed dynamically at runtime viaeg. a daemon ?quoted
Yes, this is some profiles/rules, which are system specific and according to these rules userspace can limit fan speed. Like, for example: - if one of power supplies is removed, system fan should be enforced tofullquoted
speed, because it makes a hole in a box, and it has hard impact on airflow. - If port side ambient temperature reaches some threshold X1, fan speedshouldquoted
be limited by Y1%, X2 - Y2%, etcetera. - if temperature fault is detected for any optical transceivers - some limit is required.I see, thanks for the information.quoted
quoted
quoted
For example, if cooling devices operates at cooling levels from 1 to 10 (1 for 10% fan speed, 10 for 100% fan speed), cooling device minimal speed can be limited by setting 'cur_state' attribute through 'sysfs' to the values from 'max_state' + 1 to 'max_state * 2' (from 11 to 20). Following this example if value is set to 14 (40%) cooling levels vector will be set to 4, 4, 4, 4, 4, 5, 6, 7, 8, 9, 10 for setting device speed cooling states respectively in 40, 40, 40, 40, 40, 50, 60. 70, 80, 90, 100 percent. And it limits cooling device to operate only at 40% speed and above. Maybe it would be worth adding earlier some dedicated 'cur_state_limit' attribute for this feature, but it was not done. We have another driver required this feature and one new we are developing now, which require fan minim speed limit as well.The use case is valid but I think the approach is wrong. Probably the simplest thing to do is to set a low trip point with a minimal fan speed.For "trip_point_0_temp" there is the below definition: { /* In range - 0-40% PWM */ .type = THERMAL_TRIP_ACTIVE, .temp = MLXSW_THERMAL_ASIC_TEMP_NORM, .hyst = MLXSW_THERMAL_HYSTERESIS_TEMP, .min_state = 0, .max_state = (4 * MLXSW_THERMAL_MAX_STATE) / 10,(40%)quoted
}, For "trip_point_1_temp": { /* In range - 40-100% PWM */ .type = THERMAL_TRIP_ACTIVE, .temp = MLXSW_THERMAL_ASIC_TEMP_HIGH, .hyst = MLXSW_THERMAL_HYSTERESIS_TEMP, .min_state = (4 * MLXSW_THERMAL_MAX_STATE) / 10,(100%)quoted
.max_state = MLXSW_THERMAL_MAX_STATE, }, To limit cooling device by f.e 70%, I should change dynamically 'min_state'and 'max_state'quoted
for both trips to (70%, 70%) and (70%, 100%). I am not sure I can do it? And we have many customers, using this user space interface, it would be not so good to change it.The issue is the driver is wrong here. The change you referred in the past should have not been merged. I note there is no thermal maintainer blessing in the tags.quoted
I understand it is possible to handle this issue inside driver's set_cur_stat() callback by returning positive value for configurationrequest.quoted
But maybe this feature could you useful for other developers and it could be some common interface to support it?I suggest to revert a421ce088ac8 and switch to hwmon to set the fan speed. At the first glance, it is already supported by the mlx driver, no ?
Yes, hwmon is supported. But setting fan trough hwmon means we'll have two owners controlling same device. If speed is set through hwmon for example to 100%, thermal could decide to decrease it, since it does not know what the reason was for setting full speed. And it could make a completion between hwmon and thermal for cooling control. Maybe adding new set_min_state()/get_min_state() callback could work? In such case user space can limit speed through "min_state" attrbbute. Maybe some other approach, but within thermal subsystem?
-- <http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs Follow Linaro: <http://www.facebook.com/pages/Linaro> Facebook | <http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro- blog/> Blog