Intro
Hey guys, first of all I'm a total noob with ubuntu. I installed ubuntu 5 days ago to run stable diffusion with rocm. After trying for 5 days and getting a headache, I was finally able to run stable diffusion today. I'm able to easily generate images using txt2image with 512x512 resultion. But when I try to upscale the same image with image2image, the system just shuts down. After checking the logs I found this :
Aug 27 11:19:34 strix-g15 kernel: [ 651.876234] amdgpu 0000:03:00.0: amdgpu: ERROR: GPU over temperature range(SW CTF) detected!
Aug 27 11:19:34 strix-g15 kernel: [ 651.876244] amdgpu 0000:03:00.0: amdgpu: ERROR: System is going to shutdown due to GPU SW CTF!
What I tried
As high temps of gpu were the cause of shut downs, I tried to increase the fan speeds using Fancontrol-GUI, CoreCtrl and Radeon Profile. However, when I use these apps to increase the fan speed to 100%, there is no fan noise. But I can clearly hear the loud noise of the fans when I use Armory Crate to modify the fan speed in Windows.
I also tried to follow https://wiki.archlinux.org/title/Fan_speed_control this tutorial, but when I run this command sudo sensors-detect
I get this:
# sensors-detect version 3.6.0
# System: ASUSTeK COMPUTER INC. ROG Strix G513QY_G513QY [1.0] (laptop)
# Board: ASUSTeK COMPUTER INC. G513QY
# Kernel: 6.2.0-26-generic x86_64
# Processor: AMD Ryzen 9 5900HX with Radeon Graphics (25/80/0)
This program will help you determine which kernel modules you need
to load to use lm_sensors most effectively. It is generally safe
and recommended to accept the default answers to all questions,
unless you know what you're doing.
Some south bridges, CPUs or memory controllers contain embedded sensors.
Do you want to scan for them? This is totally safe. (YES/no): y
Module cpuid loaded successfully.
Silicon Integrated Systems SIS5595... No
VIA VT82C686 Integrated Sensors... No
VIA VT8231 Integrated Sensors... No
AMD K8 thermal sensors... No
AMD Family 10h thermal sensors... No
AMD Family 11h thermal sensors... No
AMD Family 12h and 14h thermal sensors... No
AMD Family 15h thermal sensors... No
AMD Family 16h thermal sensors... No
AMD Family 17h thermal sensors... No
AMD Family 15h power sensors... No
AMD Family 16h power sensors... No
Hygon Family 18h thermal sensors... No
Intel digital thermal sensor... No
Intel AMB FB-DIMM thermal sensor... No
Intel 5500/5520/X58 thermal sensor... No
VIA C7 thermal sensor... No
VIA Nano thermal sensor... No
Some Super I/O chips contain embedded sensors. We have to write to
standard I/O ports to probe them. This is usually safe.
Do you want to scan for Super I/O sensors? (YES/no): y
Probing for Super-I/O at 0x2e/0x2f
Trying family `National Semiconductor/ITE'... No
Trying family `SMSC'... No
Trying family `VIA/Winbond/Nuvoton/Fintek'... No
Trying family `ITE'... No
Probing for Super-I/O at 0x4e/0x4f
Trying family `National Semiconductor/ITE'... Yes
Found unknown chip with ID 0x5571
Some hardware monitoring chips are accessible through the ISA I/O ports.
We have to write to arbitrary I/O ports to probe them. This is usually
safe though. Yes, you do have ISA I/O ports even if you do not have any
ISA slots! Do you want to scan the ISA I/O ports? (YES/no): y
Probing for `National Semiconductor LM78' at 0x290... No
Probing for `National Semiconductor LM79' at 0x290... No
Probing for `Winbond W83781D' at 0x290... No
Probing for `Winbond W83782D' at 0x290... No
Lastly, we can probe the I2C/SMBus adapters for connected hardware
monitoring devices. This is the most risky part, and while it works
reasonably well on most systems, it has been reported to cause trouble
on some systems.
Do you want to probe the I2C/SMBus adapters now? (YES/no): y
Using driver `i2c-piix4' for device 0000:00:14.0: AMD KERNCZ SMBus
Next adapter: Synopsys DesignWare I2C adapter (i2c-0)
Do you want to scan it? (YES/no/selectively): y
Adapter doesn't support all probing functions.
Some addresses won't be probed.
Next adapter: Synopsys DesignWare I2C adapter (i2c-1)
Do you want to scan it? (YES/no/selectively): y
Adapter doesn't support all probing functions.
Some addresses won't be probed.
Next adapter: SMBus PIIX4 adapter port 0 at 0b00 (i2c-2)
Do you want to scan it? (YES/no/selectively): y
Client found at address 0x50
Probing for `Analog Devices ADM1033'... No
Probing for `Analog Devices ADM1034'... No
Probing for `SPD EEPROM'... Yes
(confidence 8, not a hardware monitoring chip)
Probing for `EDID EEPROM'... No
Client found at address 0x51
Probing for `Analog Devices ADM1033'... No
Probing for `Analog Devices ADM1034'... No
Probing for `SPD EEPROM'... Yes
(confidence 8, not a hardware monitoring chip)
Next adapter: SMBus PIIX4 adapter port 2 at 0b00 (i2c-3)
Do you want to scan it? (YES/no/selectively): y
Next adapter: SMBus PIIX4 adapter port 1 at 0b20 (i2c-4)
Do you want to scan it? (YES/no/selectively): y
Client found at address 0x48
Probing for `National Semiconductor LM75'... No
Probing for `National Semiconductor LM75A'... No
Probing for `Dallas Semiconductor DS75'... No
Probing for `National Semiconductor LM77'... No
Probing for `Analog Devices ADT7410/ADT7420'... No
Probing for `Analog Devices ADT7411'... No
Probing for `Maxim MAX6642'... No
Probing for `Texas Instruments TMP435'... No
Probing for `National Semiconductor LM73'... No
Probing for `National Semiconductor LM92'... No
Probing for `National Semiconductor LM76'... No
Probing for `Maxim MAX6633/MAX6634/MAX6635'... No
Probing for `NXP/Philips SA56004'... No
Probing for `SMSC EMC1023'... No
Probing for `SMSC EMC1043'... No
Probing for `SMSC EMC1053'... No
Probing for `SMSC EMC1063'... No
Client found at address 0x51
Probing for `Analog Devices ADM1033'... No
Probing for `Analog Devices ADM1034'... No
Probing for `SPD EEPROM'... No
Client found at address 0x58
Probing for `Analog Devices ADT7462'... No
Probing for `Andigilog aSC7512'... No
Client found at address 0x73
Probing for `FSC Poseidon I'... No
Probing for `FSC Poseidon II'... No
Probing for `FSC Scylla'... No
Probing for `FSC Hermes'... No
Probing for `FSC Heimdal'... No
Probing for `FSC Heracles'... No
Probing for `FSC Hades'... No
Probing for `FSC Syleus'... No
Client found at address 0x77
Probing for `Asus Mozart-2'... No
Next adapter: AMDGPU SMU 0 (i2c-5)
Do you want to scan it? (yes/NO/selectively): y
Next adapter: AMDGPU SMU 1 (i2c-6)
Do you want to scan it? (yes/NO/selectively): y
Next adapter: AMDGPU DM i2c hw bus 0 (i2c-7)
Do you want to scan it? (yes/NO/selectively): y
y
Next adapter: AMDGPU DM aux hw bus 0 (i2c-8)
Do you want to scan it? (yes/NO/selectively): y
Next adapter: AMDGPU DM i2c hw bus 0 (i2c-9)
Do you want to scan it? (yes/NO/selectively): y
Next adapter: AMDGPU DM i2c hw bus 1 (i2c-10)
Do you want to scan it? (yes/NO/selectively): y
Next adapter: AMDGPU DM aux hw bus 0 (i2c-11)
Do you want to scan it? (yes/NO/selectively): y
Sorry, no sensors were detected.
This is relatively common on laptops, where thermal management is
handled by ACPI rather than the OS.
And when I run this command sudo pwmconfig
I get this:
# pwmconfig version 3.6.0
This program will search your sensors for pulse width modulation (pwm)
controls, and test each one to see if it controls a fan on
your motherboard. Note that many motherboards do not have pwm
circuitry installed, even if your sensor chip supports pwm.
We will attempt to briefly stop each fan using the pwm controls.
The program will attempt to restore each fan to full speed
after testing. However, it is ** very important ** that you
physically verify that the fans have been to full speed
after the program has completed.
Found the following devices:
hwmon0 is ADP0
hwmon1 is acpitz
hwmon2 is BAT0
hwmon3 is nvme
hwmon4 is ucsi_source_psy_USBC000:001
hwmon5 is k10temp
hwmon6 is asus
hwmon7 is asus_custom_fan_curve
hwmon8 is amdgpu
hwmon9 is amdgpu
Found the following PWM controls:
hwmon8/pwm1 current value: 104
hwmon8/pwm1 is currently setup for automatic speed control.
In general, automatic mode is preferred over manual mode, as
it is more efficient and it reacts faster. Are you sure that
you want to setup this output for manual control? (n) y
Giving the fans some time to reach full speed...
Found the following fan sensors:
hwmon6/fan1_input current speed: 0 ... skipping!
hwmon6/fan2_input current speed: 0 ... skipping!
hwmon8/fan1_input current speed: 0 ... skipping!
There are no working fan sensors, all readings are 0.
Make sure you have a 3-wire fan connected.
You may also need to increase the fan divisors.
See doc/fan-divisors for more information.
Also there are no fan control settings in the BIOS.
Laptop - Asus Rog Strix G15 Advantage Edition
Model - G513QY-212.SG15
I've run out of ideas at this point, so any assistance would be greatly appreciated.