Thread (18 messages) 18 messages, 4 authors, 2023-03-21

Re: [PATCH v8 10/13] hwmon: peci: Add cputemp driver

From: "Winiarska, Iwona" <iwona.winiarska@intel.com>
Date: 2023-03-21 09:08:30
Also in: linux-aspeed, linux-devicetree, linux-doc, linux-hwmon, lkml, openbmc

On Mon, 2023-03-20 at 14:45 +0300, Paul Fertser wrote:
Hello,

We are seeing wrong DTS temperatures on at least "Intel(R) Xeon(R)
Bronze 3204 CPU @ 1.90GHz" and most probably other Skylake Xeon CPUs
are also affected, see inline.

On Tue, Feb 08, 2022 at 04:36:36PM +0100, Iwona Winiarska wrote:
quoted
Add peci-cputemp driver for Digital Thermal Sensor (DTS) thermal
readings of the processor package and processor cores that are
accessible via the PECI interface.
...
quoted
+static const struct cpu_info cpu_hsx = {
+       .reg            = &resolved_cores_reg_hsx,
+       .min_peci_revision = 0x33,
+       .thermal_margin_to_millidegree =
&dts_eight_dot_eight_to_millidegree,
+};
+
+static const struct cpu_info cpu_icx = {
+       .reg            = &resolved_cores_reg_icx,
+       .min_peci_revision = 0x40,
+       .thermal_margin_to_millidegree = &dts_ten_dot_six_to_millidegree,
+};
...
quoted
+       {
+               .name = "peci_cpu.cputemp.skx",
+               .driver_data = (kernel_ulong_t)&cpu_hsx,
+       },
With this configuration we get this data:

/sys/bus/peci/devices/0-30/peci_cpu.cputemp.skx.48/hwmon/hwmon15# grep .
temp[123]_{label,input}
temp1_label:Die
temp2_label:DTS
temp3_label:Tcontrol
temp1_input:30938
temp2_input:67735
temp3_input:80000

On the host system "sensors" report

Package id 0:  +31.C (high = +80.C, crit = +90.C)

So I conclude Die temperature as retrieved over PECI is correct while
DTS is mis-calculated. The old downstream code in OpenBMC was using
ten_dot_six_to_millidegree() function for conversion, and that was
providing expected results. And indeed if we reverse the calculation
here we get 80000 - ((80000-67735) * 256 / 64) = 30940 which matches
expectations.
Hi!

Thanks for the report.

It was changed between v2 and v3 after a report about negative temperature on
pre-ICX platforms:
https://lore.kernel.org/lkml/6891496eabcc6f9cacec4fea505fb757ea9c11fc.camel@intel.com/ (local)

Unfortunately, I'm not able to test this on Cascade Lake X (or any other pre-ICX
platform).
I just sent a patch that changes SKX to use S10.6 format:
https://lore.kernel.org/lkml/20230321090410.866766-1-iwona.winiarska@intel.com/ (local)

Thanks
-Iwona
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Keyboard shortcuts
hback out one level
jnext message in thread
kprevious message in thread
ldrill in
Escclose help / fold thread tree
?toggle this help