Score:1

CPU temperature spike in 90c+ only when plugged in

mz flag

My Asus Vivobook K571GT dual booting in Ubuntu 20.04 is recently started shutting down due to high temperature (reaching 99c+). These temperature are reached only when the laptop is plugged in.

The BIOS is updated to the latest, Ubuntu updated to the latest kernel. I've seen it might be due to nvidia driver not installed properly, so I tried a bunch of different nvidia drivers (460, 470 & 495). Tried disabling nvdia altogether running only with the integrated GPU. They all had the same results, when plugged in the temperature spike from a respectable 40c-45c to 95c in a second (without that much CPU load, i.e. running the apt update command will make the CPU temperature rise to 90c+), if I don't stop what I am doing or a command is running & I can't stop it in time the CPU will hit the 100c mark which trigger the shutdown. Interestingly if I unplugged while I get a high temperature warning the temperature goes back down to 45-50c in a second.

Has anyone experience something similar? The only thing I can think of for the rapid CPU temperature spike when plugged in but not on battery is the CPU getting "overclocked" when somehow. I'm not sure how I can verify this & if it somehow does how to prevent this from happening? An hardware issue like the AC adapter providing too much power?

Edit

grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_driver

/sys/devices/system/cpu/cpu0/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu10/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu11/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu1/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu2/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu3/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu4/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu5/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu6/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu7/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu8/cpufreq/scaling_driver:intel_pstate
/sys/devices/system/cpu/cpu9/cpufreq/scaling_driver:intel_pstate

grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor

/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu10/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu11/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu1/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu2/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu3/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu4/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu5/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu6/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu7/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu8/cpufreq/scaling_governor:powersave
/sys/devices/system/cpu/cpu9/cpufreq/scaling_governor:powersave

grep "model name" /proc/cpuinfo

model name  : Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz

cat /sys/devices/system/cpu/intel_pstate/no_turbo

0

Edit

ps auxc | grep -i therm

root         167  0.0  0.0      0     0 ?        I<   10:18   0:00 acpi_thermal_pm
root        1049  0.0  0.0 128808  9456 ?        Ssl  10:18   0:00 thermald

sudo dmidecode -s bios-version

X571GT.311

ls -al /etc/thermald

total 28
drwxr-xr-x   2 root root  4096 Sep  8 13:48 .
drwxr-xr-x 148 root root 12288 Nov  2 12:01 ..
-rw-r--r--   1 root root  4605 Jan 14  2019 thermal-conf.xml
-rw-r--r--   1 root root   508 Jan 14  2019 thermal-cpu-cdev-order.xml

The laptop is just a year or two old. The latest BIOS update was release just a couple of weeks ago.

cat /etc/thermald/thermal-conf.xml

<?xml version="1.0"?>

<!--
use "man thermal-conf.xml" for details
-->

<!-- BEGIN -->
<ThermalConfiguration>
<Platform>
    <Name>Generic X86 Laptop Device</Name>
    <ProductName>EXAMPLE_SYSTEM</ProductName>
    <Preference>QUIET</Preference>
    <ThermalSensors>
        <ThermalSensor>
            <Type>TSKN</Type>
            <AsyncCapable>1</AsyncCapable>
        </ThermalSensor>
    </ThermalSensors>
    <ThermalZones>
        <ThermalZone>
            <Type>SKIN</Type>
            <TripPoints>
                <TripPoint>
                    <SensorType>TSKN</SensorType>
                    <Temperature>55000</Temperature>
                    <type>passive</type>
                    <ControlType>SEQUENTIAL</ControlType>
                    <CoolingDevice>
                        <index>1</index>
                        <type>rapl_controller</type>
                        <influence> 100 </influence>
                        <SamplingPeriod> 16 </SamplingPeriod>
                    </CoolingDevice>
                    <CoolingDevice>
                        <index>2</index>
                        <type>intel_powerclamp</type>
                        <influence> 100 </influence>
                        <SamplingPeriod> 12 </SamplingPeriod>
                    </CoolingDevice>
                </TripPoint>
            </TripPoints>
        </ThermalZone>
    </ThermalZones>
</Platform>

<!-- Thermal configuration example only -->
<Platform>
    <Name>Example Platform Name</Name>
    <!--UUID is optional, if present this will be matched -->
    <!-- Both product name and UUID can contain
        wild card "*", which matches any platform
     -->
    <UUID>Example UUID</UUID>
    <ProductName>Example Product Name</ProductName>
    <Preference>QUIET</Preference>
    <ThermalSensors>
        <ThermalSensor>
            <!-- New Sensor with a type and path -->
            <Type>example_sensor_1</Type>
            <Path>/some_path</Path>
            <AsyncCapable>0</AsyncCapable>
        </ThermalSensor>
        <ThermalSensor>
            <!-- Already present in thermal sysfs,
                enable this or add/change config
                For example, here we are indicating that
                sensor can do async events to avoid polling
            -->
            <Type>example_thermal_sysfs_sensor</Type>
            <!-- If async capable, then we don't need to poll -->
            <AsyncCapable>1</AsyncCapable>
        </ThermalSensor>
        <ThermalSensor>
            <!-- Examle of a virtual sensor. This sensor
                depends on other real sensor or
                virtual sensor.
                E.g. here the temp will be
                 temp of example_sensor_1 * 0.5 + 10
            -->
            <Type>example_virtual_sensor</Type>
            <Virtual>1</Virtual>
            <SensorLink>
                <SensorType>example_sensor_1</SensorType>
                <Multiplier> 0.5 </Multiplier>
                <Offset> 10 </Offset>
            </SensorLink>
        </ThermalSensor>

    </ThermalSensors>
    <ThermalZones>
        <ThermalZone>
            <Type>Example Zone type</Type>
            <TripPoints>
                <TripPoint>
                    <SensorType>example_sensor_1</SensorType>
                    <!-- Temperature at which to take action -->
                    <Temperature> 75000 </Temperature>
                    <!-- max/passive/active
                        If a MAX type is specified, then
                        daemon will use PID control
                        to aggresively throttle to avoid
                        reaching this temp.
                     -->
                    <type>max</type>
                    <!-- SEQUENTIAL | PARALLEL
                    When a trip point temp is violated, then
                    number of cooling device can be activated.
                    If control type is SEQUENTIAL then
                    It will exhaust first cooling device before trying
                    next.
                    -->
                    <ControlType>SEQUENTIAL</ControlType>
                    <CoolingDevice>
                        <index>1</index>
                        <type>example_cooling_device</type>
                        <!-- Influence will be used order cooling devices.
                            First cooling device will be used, which has
                            highest influence.
                        -->
                        <influence> 100 </influence>
                        <!-- Delay in using this cdev, this takes some time
                        too actually cool a zone
                        -->
                        <SamplingPeriod> 12 </SamplingPeriod>
                    </CoolingDevice>
                </TripPoint>

            </TripPoints>
        </ThermalZone>
    </ThermalZones>
    <CoolingDevices>
        <CoolingDevice>
            <!--
                Cooling device can be specified
                by a type and optionally a sysfs path
                If the type already present in thermal sysfs
                no need of a path.
                Compensation can use min/max and step size
                to increasing cool the system.
                Debounce period can be used to force
                a waiting period for action
            -->
            <Type>example_cooling_device</Type>
            <MinState>0</MinState>
            <IncDecStep>10</IncDecStep>
            <ReadBack> 0 </ReadBack>
            <MaxState>50</MaxState>
            <DebouncePeriod>5000</DebouncePeriod>
            <!--
                If there are no PID parameter
                compensation increase step wise and exponentaially
                if single step is not able to change trend.
                Alternatively a PID parameters can be specified
                then next step will use PID calculation using
                provided PID constants.
            -->>
            <PidControl>
                <kp>0.001</kp>
                <kd>0.0001</kd>
                <ki>0.0001</ki>
            </PidControl>
        </CoolingDevice>
    </CoolingDevices>
</Platform>
</ThermalConfiguration>
<!-- END -->

top

top - 13:16:27 up  1:37,  1 user,  load average: 0.85, 1.32, 1.11
Tasks: 487 total,   2 running, 484 sleeping,   1 stopped,   0 zombie
%Cpu(s):  5.1 us,  2.0 sy,  1.5 ni, 90.6 id,  0.1 wa,  0.0 hi,  0.7 si,  0.0 st
GiB Mem :     15.5 total,      4.5 free,      5.0 used,      5.9 buff/cache
GiB Swap:      2.0 total,      2.0 free,      0.0 used.     10.1 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                       
  35883 root      39  19   84636  68132  12616 R  19.8   0.4   0:00.60 apt-check                     
   4842 haleks    20   0 4487900 483220 120988 S   2.6   3.0   1:49.49 gnome-shell                   
   7291 haleks    20   0  923372  60172  45804 S   2.3   0.4   1:34.25 psensor                       
  32705 haleks    20   0   24.5g 130676  77652 S   2.3   0.8   0:14.20 brave                         
    975 message+  20   0   40380  34872   4068 S   1.0   0.2   0:31.14 dbus-daemon                   
   1002 root      20   0 2332860  32620  16456 S   1.0   0.2   0:05.98 snapd                         
   4555 haleks    20   0   24.7g 147872  79744 S   1.0   0.9   1:10.25 Xorg                          
   5229 haleks    20   0 2258744 131912  45796 S   1.0   0.8   1:16.97 keybase                       
  35782 root      20   0  287276  16044  14104 S   1.0   0.1   0:00.03 packagekitd                   
    663 root     -51   0       0      0      0 S   0.7   0.0   0:38.09 irq/152-nvidia                
  21473 haleks    20   0  819496  53768  39012 S   0.7   0.3   0:07.86 gnome-terminal-               
  32564 haleks    20   0   16.6g 410380 190120 S   0.7   2.5   0:42.65 brave                         
  32596 haleks    20   0   16.6g 182632  87372 S   0.7   1.1   0:47.20 brave                         
  34076 root      20   0   25368  13280   7900 S   0.7   0.1   0:00.16 apt                           
    357 root      19  -1   68944  30764  29000 S   0.3   0.2   0:01.12 systemd-journal               
    387 root      20   0   24164   7796   4236 S   0.3   0.0   0:02.20 systemd-udevd                 
    517 root     -51   0       0      0      0 S   0.3   0.0   0:00.73 irq/148-iwlwifi               
    992 root      20   0  235188  10276   6928 S   0.3   0.1   0:02.17 polkitd                       
   1065 root      20   0  716580  12360   9072 S   0.3   0.1   0:01.60 canonical-livep               
   1349 gdm       20   0  317300   9004   7968 S   0.3   0.1   0:00.28 goa-identity-se               
   1864 root      20   0 2432052 150584  31964 S   0.3   0.9   0:07.40 lxd                           
   4545 haleks    20   0    8748   5860   4012 S   0.3   0.0   0:01.37 dbus-daemon                   
   5448 haleks    20   0 2370936 172572  33964 S   0.3   1.1   0:27.26 kbfsfuse                      
   7473 haleks    20   0  503408 143448  66476 S   0.3   0.9   0:35.84 Keybase                       
   7575 haleks    20   0  463344  40076  32528 S   0.3   0.2   0:00.39 update-notifier               
  10111 haleks    20   0  582224 166968  80480 S   0.3   1.0   0:37.21 gitkraken                     
  32662 haleks    20   0   24.4g 121680  81520 S   0.3   0.7   0:03.68 brave                         
  35783 root      20   0   24164   5228   1652 S   0.3   0.0   0:00.01 systemd-udevd                 
  35784 root      20   0   24164   5228   1652 S   0.3   0.0   0:00.01 systemd-udevd                 
  35786 root      20   0   24164   5228   1652 S   0.3   0.0   0:00.01 systemd-udevd                 
      1 root      20   0  168176  12092   8296 S   0.0   0.1   0:08.88 systemd                       
      2 root      20   0       0      0      0 S   0.0   0.0   0:00.02 kthreadd                      
      3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp                        
      4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp                    
      6 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/0:0H-kblockd          
      9 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 mm_percpu_wq                  
     10 root      20   0       0      0      0 S   0.0   0.0   0:00.11 ksoftirqd/0                   
     11 root      20   0       0      0      0 I   0.0   0.0   0:09.66 rcu_sched                     
     12 root      rt   0       0      0      0 S   0.0   0.0   0:00.02 migration/0                   
     13 root     -51   0       0      0      0 S   0.0   0.0   0:00.00 idle_inject/0                 
     14 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/0                       
     15 root      20   0       0      0      0 S   0.0   0.0   0:00.00 cpuhp/1                       
     16 root     -51   0       0      0      0 S   0.0   0.0   0:00.00 idle_inject/1                 
     17 root      rt   0       0      0      0 S   0.0   0.0   0:00.18 migration/1                   
     18 root      20   0       0      0      0 S   0.0   0.0   0:00.06 ksoftirqd/1                   
Doug Smythies avatar
gn flag
What CPU frequency scaling driver? `grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_driver`. What governor, plugged in and unplugged? `grep . /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor`. What CPU make and model? `grep "model name" /proc/cpuinfo`. Is turbo enabled, plugged in and unplugged? (Method is driver dependant, intel_pstate shown): `cat /sys/devices/system/cpu/intel_pstate/no_turbo`.
heynnema avatar
ru flag
Edit your question and show me `ps auxc | grep -i therm` and `sudo dmidecode -s bios-version`. How old is this laptop? Is it very dusty? Start comments to me with @heynnema or I'll miss them.
heynnema avatar
ru flag
Reset the Power Manager by shutting down the laptop, then holding down the POWER button for ~20 seconds, then reboot and retest.
heynnema avatar
ru flag
BIOS is current. Show me `ls -al /etc/thermald`.
Doug Smythies avatar
gn flag
And/or set a lower trip point temperature for thermald. Is turbo disabled when you are unplugged, or no change?
heynnema avatar
ru flag
Show me `cat /etc/thermald/thermal-conf.xml` and `top`.
haleksandre avatar
mz flag
@DougSmythies I'm not sure about turbo & thermald trip point. How I can verify this?
haleksandre avatar
mz flag
I'll try the Power Management reset @heynnema suggested & post if there is any changes.
heynnema avatar
ru flag
Rename /etc/thermald/thermal-conf.xml to thermal-conf.xml.HOLD and restart `thermald` and retest.
Doug Smythies avatar
gn flag
I agree with @heynnema on thermald.
Score:2
ru flag

Your /etc/thermald/thermal-conf.xml is incorrect. It's two example files tacked together.

Try this somewhat generic .xml file shown below.

Note: You may end up customizing the following line...

<Temperature>60000</Temperature>

Then restart thermald with:

sudo systemctl restart thermald

<?xml version="1.0"?>
<ThermalConfiguration>
  <Platform>
    <Name>Override CPU default passive</Name>
    <ProductName>*</ProductName>
    <Preference>QUIET</Preference>
    <ThermalZones>
      <ThermalZone>
        <Type>cpu</Type>
        <TripPoints>
          <TripPoint>
            <Temperature>60000</Temperature>
            <type>passive</type>
          </TripPoint>
        </TripPoints>
      </ThermalZone>
    </ThermalZones>
  </Platform>
</ThermalConfiguration>
haleksandre avatar
mz flag
I've updated the configuration file. So far it seems to have helped. I'll keep testing with the laptop plugged in throughout the day & report if I've had CPU spike 90c+. Thanks for your help really appreciate it!
heynnema avatar
ru flag
@haleksandre Good! You didn't show me the `top` command yet.
heynnema avatar
ru flag
@haleksandre Do a `sudo apt update` while running `top` and look for cpu throttling processes at the same time, and monitor the temps.
haleksandre avatar
mz flag
After a few hous, I'm still experiencing CPU temperature spikes in the 90c, but so far it hasn't hit the threshold of 100c causing a shutdown. Should I lower the temperature tip point?
heynnema avatar
ru flag
@haleksandre Yes. Try 55000, or 50000. Monitor with my previous comment. You don't want to see throttling at normal usage. Note the minor edit in my .xml text.
haleksandre avatar
mz flag
It looks like it helped making the laptop useful again when plugged in. Still have the occasional CPU spike temperature but they've became manageable. Thanks again!
heynnema avatar
ru flag
@haleksandre I have similar CPU temp spikes. I think it's the Nvidia.
Doug Smythies avatar
gn flag
I think it is the slow response time of thermald relative to the incredibly fast processor temperature rate of increase under step function load. The temperature overshoots before themald has time to respond.
heynnema avatar
ru flag
@DougSmythies Yes. The thermald response time can be configured in the thermal-conf.xml file. See my answer at https://askubuntu.com/questions/1400361/optimize-thermal-daemon/1400418?noredirect=1#comment2429746_1400418
Doug Smythies avatar
gn flag
@heynnema : Yes, thanks for the comment. The point is that the temperature can ramp up very fast. I am liking the new TCC offset method, because it carries no kernel code overhead.
heynnema avatar
ru flag
@DougSmythies Where can I read about the TCC offset method?
Doug Smythies avatar
gn flag
@heynnema : Sorry, I thought you had seen [my post](https://askubuntu.com/questions/1373633/how-to-troubleshoot-cpu-hw-crash-in-ubuntu-18-04/1373784#1373784), method 2.
heynnema avatar
ru flag
@DougSmythies Complicated. My computer does have TCC offset. I don't have TurboStat. What does TCC offset have to do with thermald, if anything?
Doug Smythies avatar
gn flag
Nothing to do with themald, but rather instead of thermald. @heynnema.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.