Score:0

Fans constantly at high speed on new AMD Epyc server

gr flag

I have recently purchased the following unit: “System ASUS RS720A-E11-RS12E/10G | 2U / 12-Bay | GPU” equipped with dual epyc 7713 cpus.

I installed ubuntu 22.04 on the system and it boots up fine, but whenever the machine is turned on the fans always run at high (max?) speed.

When I check the IPMI hardware monitor in the BIOS, it reads 7020 RPM for four fans, and

CPU1 temperature: 43 degC, CPU2 temperature 36 degC, TR1 temperature: 22 degC.

I tried upgrading to the latest kernel version (6.3.7) but this did not solve the issue.

I installed lm_sensors, and the sensors output is given below. It seems like a lot of the sensors are giving off an alarm (?). The TSI temp readings are nonsensical (e.g. +3892314.0°C), and CPUTIN reads +127.5°C, which seems alarmingly high.

Any advice would be greatly appreciated. For the time being, I can’t even figure out whether (i) the readings are correct and there is a hardware problem, or (ii) the readings are nonsense and there is a software issue.


k10temp-pci-00c3
Adapter: PCI adapter
Tctl:         +44.8°C  
Tccd1:        +37.5°C  
Tccd2:        +37.8°C  
Tccd3:        +38.2°C  
Tccd4:        +38.5°C  
Tccd5:        +35.2°C  
Tccd6:        +38.0°C  
Tccd7:        +38.8°C  
Tccd8:        +38.0°C  

nvme-pci-c100
Adapter: PCI adapter
Composite:    +23.9°C  (low  = -20.1°C, high = +89.8°C)
                       (crit = +94.8°C)

k10temp-pci-00cb
Adapter: PCI adapter
Tctl:         +41.5°C  
Tccd1:        +37.0°C  
Tccd2:        +35.5°C  
Tccd3:        +34.8°C  
Tccd4:        +36.8°C  
Tccd5:        +36.5°C  
Tccd6:        +35.0°C  
Tccd7:        +36.0°C  
Tccd8:        +36.0°C  

nct6793-isa-0290
Adapter: ISA adapter
in0:                     2.04 V  (min =  +0.00 V, max =  +1.74 V)  ALARM
in1:                   160.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in2:                     3.33 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in3:                     3.31 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in4:                   296.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in5:                   120.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in6:                   168.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in7:                     3.33 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in8:                     3.33 V  (min =  +0.00 V, max =  +0.00 V)  ALARM
in9:                     0.00 V  (min =  +0.00 V, max =  +0.00 V)
in10:                  160.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in11:                  168.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in12:                  168.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in13:                  160.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
in14:                  184.00 mV (min =  +0.00 V, max =  +0.00 V)  ALARM
fan1:                     0 RPM  (min =    0 RPM)
fan2:                     0 RPM  (min =    0 RPM)
SYSTIN:                +107.0°C  (high =  +0.0°C, hyst =  +0.0°C)  ALARM  sensor = thermistor
CPUTIN:                +127.5°C  (high = +80.0°C, hyst = +75.0°C)  ALARM  sensor = CPU diode
AUXTIN0:                +94.0°C    sensor = thermistor
AUXTIN1:               +107.0°C    sensor = thermistor
AUXTIN2:               +105.0°C    sensor = thermistor
AUXTIN3:               +105.0°C    sensor = thermistor
PCH_CHIP_CPU_MAX_TEMP:   +0.0°C  
PCH_CHIP_TEMP:           +0.0°C  
PCH_CPU_TEMP:            +0.0°C  
PCH_MCH_TEMP:            +0.0°C  
TSI2_TEMP:             +3892314.0°C  
TSI3_TEMP:             +3892314.0°C  
TSI4_TEMP:             +3892314.0°C  
TSI5_TEMP:             +3892314.0°C  
TSI6_TEMP:             +3892314.0°C  
TSI7_TEMP:             +3892314.0°C  
intrusion0:            ALARM
intrusion1:            ALARM
beep_enable:           disabled
djdomi avatar
za flag
open a ticket at the vendor to verify the situation. if its new use it and while turning on a server it's normally that the device runs at full speed. but should go down around 10 till 30 min later usually, depends on the cooling decision
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.