Thread (9 messages) 9 messages, 3 authors, 2021-09-28

RE: [PATCH thermal 1/1] thermal/core: Skip cooling device statistics update for configuration operation

From: Vadim Pasternak <vadimp@nvidia.com>
Date: 2021-09-27 17:52:56

-----Original Message-----
From: Daniel Lezcano <redacted>
Sent: Monday, September 27, 2021 4:56 PM
To: Vadim Pasternak <vadimp@nvidia.com>; rui.zhang@intel.com
Cc: linux-pm@vger.kernel.org; Ido Schimmel <idosch@nvidia.com>; Rafael J.
Wysocki [off-list ref]
Subject: Re: [PATCH thermal 1/1] thermal/core: Skip cooling device statistics
update for configuration operation

On 27/09/2021 15:29, Vadim Pasternak wrote:
quoted
quoted
-----Original Message-----
From: Daniel Lezcano <redacted>
Sent: Monday, September 27, 2021 3:32 PM
To: Vadim Pasternak <vadimp@nvidia.com>; rui.zhang@intel.com
Cc: linux-pm@vger.kernel.org; Ido Schimmel <idosch@nvidia.com>;
Rafael J.
quoted
quoted
Wysocki [off-list ref]
Subject: Re: [PATCH thermal 1/1] thermal/core: Skip cooling device
statistics update for configuration operation




On 27/09/2021 13:22, Vadim Pasternak wrote:
quoted
Hi Daniel,

Thank you for quick reply.
quoted
-----Original Message-----
From: Daniel Lezcano <redacted>
Sent: Monday, September 27, 2021 1:42 PM
To: Vadim Pasternak <vadimp@nvidia.com>; rui.zhang@intel.com
Cc: =idosch@nvidia.com; linux-pm@vger.kernel.org
Subject: Re: [PATCH thermal 1/1] thermal/core: Skip cooling device
statistics update for configuration operation


Hi Vadim,


On 27/09/2021 10:24, Vadim Pasternak wrote:
quoted
The thermal subsystem maintains a transition table between states
that is allocated according to the maximum state supported by the
cooling device.

When the table needs to be updated, the thermal subsystem does
not
quoted
quoted
quoted
quoted
quoted
validate that the new state does not exceed the maximum state,
leading to out-of-bounds memory accesses [1].
Actually, thermal_cooling_device_stats_update() is called if the
set_cur_state is successful.

With a state greater than the max state, the set_cur_state should
fail and
thermal_cooling_device_stats_update() is not called.

Perhaps the problem is in mlxsw_thermal_set_cur_state() ?
"mlxsw" thermal drivers has additional use of 'sysfs' 'cur_state'
for configuration purpose to limit minimum fan speed.
Fan speed minimum is enforced by setting 'cur_state' with value
exceeding actual fan speed maximum.
Yes, and that is the problem because the driver is doing weird things
with the cooling device state resulting in an abuse of the sysfs API
and conflicting with the thermal internals.

quoted
This feature provides ability to limit fan speed according to some
system wise considerations, like absence of some replaceable units
or high system ambient temperature, or some other factors which
indirectly impacts system airflow.
Is that a static thermal profile depending on the platform set by
userspace or something which can be changed dynamically at runtime via
eg. a daemon ?
quoted
Yes, this is some profiles/rules, which are system specific and
according to these rules userspace can limit fan speed. Like, for example:
- if one of power supplies is removed, system fan should be enforced to
full
quoted
  speed, because it makes a hole in a box, and it has hard impact on airflow.
- If port side ambient temperature reaches some threshold X1, fan speed
should
quoted
  be limited by Y1%, X2 - Y2%, etcetera.
- if temperature fault is detected for any optical transceivers - some limit is
  required.
I see, thanks for the information.
quoted
quoted
quoted
For example, if cooling devices operates at cooling levels from 1 to
10
(1 for 10% fan speed, 10 for 100% fan speed), cooling device minimal
speed can be limited by setting 'cur_state' attribute through 'sysfs'
to the values from 'max_state' + 1 to 'max_state * 2' (from 11 to 20).
Following this example if value is set to 14 (40%) cooling levels
vector will be set to 4, 4, 4, 4, 4, 5, 6, 7, 8, 9, 10 for setting
device speed cooling states respectively in 40, 40, 40, 40, 40, 50,
60. 70, 80, 90,
100 percent. And it limits cooling device to operate only at 40%
speed and above.

Maybe it would be worth adding earlier some dedicated 'cur_state_limit'
attribute for this feature, but it was not done.

We have another driver required this feature and one new we are
developing now, which require fan minim speed limit as well.
The use case is valid but I think the approach is wrong. Probably the
simplest thing to do is to set a low trip point with a minimal fan speed.
For "trip_point_0_temp" there is the below definition:
	{	/* In range - 0-40% PWM */
		.type		= THERMAL_TRIP_ACTIVE,
		.temp		= MLXSW_THERMAL_ASIC_TEMP_NORM,
		.hyst		= MLXSW_THERMAL_HYSTERESIS_TEMP,
		.min_state	= 0,
		.max_state	= (4 * MLXSW_THERMAL_MAX_STATE) / 10,
(40%)
quoted
	},
For "trip_point_1_temp":
	{
		/* In range - 40-100% PWM */
		.type		= THERMAL_TRIP_ACTIVE,
		.temp		= MLXSW_THERMAL_ASIC_TEMP_HIGH,
		.hyst		= MLXSW_THERMAL_HYSTERESIS_TEMP,
		.min_state	= (4 * MLXSW_THERMAL_MAX_STATE) / 10,
(100%)
quoted
		.max_state	= MLXSW_THERMAL_MAX_STATE,
	},

To limit cooling device by f.e 70%, I should change dynamically 'min_state'
and 'max_state'
quoted
for both trips to (70%, 70%) and (70%, 100%). I am not sure I can do it?

And we have many customers, using this user space interface, it would
be not so good to change it.
The issue is the driver is wrong here. The change you referred in the past
should have not been merged. I note there is no thermal maintainer blessing
in the tags.
quoted
I understand it is possible to handle this issue inside driver's
set_cur_stat() callback by returning positive value for configuration
request.
quoted
But maybe this feature could you useful for other developers and it
could be some common interface to support it?
I suggest to revert a421ce088ac8 and switch to hwmon to set the fan speed.
At the first glance, it is already supported by the mlx driver, no ?
Yes, hwmon is supported.
But setting fan trough hwmon means we'll have two owners controlling same device.
If speed is set through hwmon for example to 100%, thermal could decide to decrease
it, since it does not know what the reason was for setting full speed.
And it could make a completion between hwmon and thermal for cooling control.

Maybe adding new set_min_state()/get_min_state() callback could work?
In such case user space can limit speed through "min_state" attrbbute.

Maybe some other approach, but within thermal subsystem?




--
<http://www.linaro.org/> Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  <http://www.facebook.com/pages/Linaro> Facebook |
<http://twitter.com/#!/linaroorg> Twitter | <http://www.linaro.org/linaro-
blog/> Blog
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help