RE: [PATCH] thermal/intel: introduce tcc cooling driver
From: Doug Smythies <hidden>
Date: 2021-01-16 21:22:27
Subsystem:
the rest, turbostat utility · Maintainers:
Linus Torvalds, "Len Brown"
On 2021.01.16 09:08 Doug Smythies wrote:
On 2021.01.15 Zhang Rui wrote:
Added Len to the "To" list: Turostat has another issue with this stuff. It will be more work than I want to do to submit a fix patch, so I am not, but see further down for my hack fix. ...
Example step function overshoot, trip point set to 55 degrees C. doug@s18:~$ sudo ~/turbostat --Summary --quiet --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,GFXWatt,IRQ -- interval 1 Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt 0.07 800 45 24 1.89 0.00 0.04 800 29 23 1.89 0.00 61.76 4546 4151 66 103.77 0.00 < step function load applied on 4 of 6 cores 67.76 4570 4476 66 120.42 0.00 68.03 4567 4488 66 120.73 0.00 67.98 4572 4492 67 121.00 0.00 < 19 degrees over trip point 68.10 4489 4493 58 109.19 0.00 < this throttling is either the power servo or the temp servo. 68.08 4262 4476 51 82.82 0.00 < this throttling is the temp servo. 68.13 4143 4513 48 75.16 0.00 68.03 4086 4488 46 71.87 0.00 < It actually undershoots often, I don't know why. 68.12 4000 4505 46 67.02 0.00 < often it doesn't undershoot.
It turns out that tubostat does not list the package temperature properly if it is started with an active TCC offset. It erroneously includes the offset in the temperature math. In the above example turbostat had also not yet been fixed for the bit mask issue. So the real temp above was 59 degrees C.
68.44 4000 4502 45 67.16 0.00 68.06 4000 4483 45 66.95 0.00 68.02 3973 4490 44 65.20 0.00 67.94 3900 4489 43 60.51 0.00 67.88 3900 4501 44 60.55 0.00 67.85 3900 4472 43 60.52 0.00
And it settled at about 56 degrees, close to what was asked for. To proceed with my work, I did a hack fix to turbostat: doug@s18:~/temp-k-git/linux/tools/power/x86/turbostat$ git diff
diff --git a/tools/power/x86/turbostat/turbostat.c b/tools/power/x86/turbostat/turbostat.c
index d7acdd4d16c4..7f0a22ab3a0d 100644
--- a/tools/power/x86/turbostat/turbostat.c
+++ b/tools/power/x86/turbostat/turbostat.c@@ -4831,6 +4831,7 @@ int read_tcc_activation_temp() fprintf(outf, "cpu%d: MSR_IA32_TEMPERATURE_TARGET: 0x%08llx (%d C) (%d default - %d offset)\n", base_cpu, msr, tcc, target_c, offset_c); + tcc = target_c; return tcc; }
So this: cpu4: MSR_IA32_TEMPERATURE_TARGET: 0x2b64100d (57 C) (100 default - 43 offset) cpu0: MSR_IA32_PACKAGE_THERM_STATUS: 0x88420000 (-9 C) becomes this: cpu1: MSR_IA32_TEMPERATURE_TARGET: 0x2b64100d (57 C) (100 default - 43 offset) cpu0: MSR_IA32_PACKAGE_THERM_STATUS: 0x88400000 (36 C) and this: Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt 0.08 1079 928 -11 1.91 0.00 Becomes this: Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt 0.05 1046 846 32 1.94 0.00 So now back to my overshoot example: This:
67.98 4572 4492 67 121.00 0.00 < 19 degrees over trip point
Was actually:
67.98 4572 4492 80 121.00 0.00 <<< 25 degrees over trip point
But let's just do it again: doug@s18:~$ cat /sys/devices/virtual/thermal/cooling_device11/cur_state 43 <<< so 100 - 43 = 57 degrees trip point. doug@s18:~$ sudo ~/turbostat --Summary --quiet --show Busy%,Bzy_MHz,PkgTmp,PkgWatt,GFXWatt,IRQ --interval 0.25 Busy% Bzy_MHz IRQ PkgTmp PkgWatt GFXWatt 0.09 800 6 36 2.01 0.00 0.16 800 23 36 2.00 0.00 0.11 800 14 36 2.15 0.00 66.81 4461 1160 70 101.17 0.00 <<< load applied, temp up 34 degrees in less than 0.25 seconds. Normal. 68.06 4581 1126 74 117.36 0.00 67.69 4589 1119 76 119.60 0.00 67.80 4589 1125 77 120.94 0.00 67.83 4587 1132 78 120.75 0.00 67.68 4591 1125 78 121.63 0.00 68.07 4585 1139 77 121.25 0.00 67.80 4588 1121 79 121.41 0.00 <<< now 20 degrees over trip point. 68.57 4579 1139 79 121.71 0.00 ... 68.03 4220 1130 63 80.28 0.00 <<< it takes quite awhile (>7 seconds) to really throttle down. ... Doug