20.04 Random & Frequent Reboots - No detectable reason

Question

Score:2

Ubuntu

20.04 Random & Frequent Reboots - No detectable reason

kawami1910

12/2/22, 9:47 AM

We recently deployed some new hardware and since Day 1 have been experiencing random reboots, an a lot of them. I've actually been working at the console and it's just rebooted without any warning.

We've gone down a bunch of rabbit holes trying to troubleshoot, but so far nothing has panned out. It's happening on multiple devices which makes me tend to think that it is not a hardware problem with one bad device.

First we thought it might be heat, as these are deployed "in the field," but the reboots happen at all hours of the day/night, not just at the hottest times of the day. Sometimes it's in the middle of the night when it's 50 degrees F in the cabinet and the device is running at it's lowest load.

It does, however, seem to be during times of heaviest CPU load. Here are recent 'last reboot' entries:

reboot   system boot  5.4.0-77-generic Sun Aug  1 17:31   still running
reboot   system boot  5.4.0-77-generic Sun Aug  1 15:48   still running
reboot   system boot  5.4.0-77-generic Sun Aug  1 15:32   still running
reboot   system boot  5.4.0-77-generic Sat Jul 31 19:02   still running
reboot   system boot  5.4.0-77-generic Sat Jul 31 17:56   still running
reboot   system boot  5.4.0-77-generic Sat Jul 31 17:30   still running
reboot   system boot  5.4.0-77-generic Sat Jul 31 17:17   still running
reboot   system boot  5.4.0-77-generic Sat Jul 31 16:52   still running
reboot   system boot  5.4.0-77-generic Sat Jul 31 16:40   still running
reboot   system boot  5.4.0-77-generic Fri Jul 30 23:13   still running
reboot   system boot  5.4.0-77-generic Fri Jul 30 22:37   still running
reboot   system boot  5.4.0-77-generic Fri Jul 30 22:05   still running
reboot   system boot  5.4.0-77-generic Fri Jul 30 21:42   still running
reboot   system boot  5.4.0-77-generic Fri Jul 30 21:24   still running
reboot   system boot  5.4.0-77-generic Fri Jul 30 20:53   still running
reboot   system boot  5.4.0-77-generic Fri Jul 30 20:42   still running

dmesg doesn't show anything useful related to the reboots. We've tailed /var/log/kern.log and syslog.log all day, but there's nothing added just before the reboots.

Thinking that it might be heat-related we did a 'watch -n 1 sensors' around the times when they are most likely to reboot, and although the CPU was "warm" it was still below the HIGH limit, and 20-30 degrees C lower than the CRITICAL limit which as I understand is where it would shutdown/reboot.

What can we try next to track down the cause of these reboots?

Thanks.

55

0 + 1

reboot

shutdown

temperature

20.04 Random & Frequent Reboots - No detectable reason

Post an answer