Score:2

20.04 Random & Frequent Reboots - No detectable reason

de flag

We recently deployed some new hardware and since Day 1 have been experiencing random reboots, an a lot of them. I've actually been working at the console and it's just rebooted without any warning.

We've gone down a bunch of rabbit holes trying to troubleshoot, but so far nothing has panned out. It's happening on multiple devices which makes me tend to think that it is not a hardware problem with one bad device.

First we thought it might be heat, as these are deployed "in the field," but the reboots happen at all hours of the day/night, not just at the hottest times of the day. Sometimes it's in the middle of the night when it's 50 degrees F in the cabinet and the device is running at it's lowest load.

It does, however, seem to be during times of heaviest CPU load. Here are recent 'last reboot' entries:

reboot   system boot  5.4.0-77-generic Sun Aug  1 17:31   still running
reboot   system boot  5.4.0-77-generic Sun Aug  1 15:48   still running
reboot   system boot  5.4.0-77-generic Sun Aug  1 15:32   still running
reboot   system boot  5.4.0-77-generic Sat Jul 31 19:02   still running
reboot   system boot  5.4.0-77-generic Sat Jul 31 17:56   still running
reboot   system boot  5.4.0-77-generic Sat Jul 31 17:30   still running
reboot   system boot  5.4.0-77-generic Sat Jul 31 17:17   still running
reboot   system boot  5.4.0-77-generic Sat Jul 31 16:52   still running
reboot   system boot  5.4.0-77-generic Sat Jul 31 16:40   still running
reboot   system boot  5.4.0-77-generic Fri Jul 30 23:13   still running
reboot   system boot  5.4.0-77-generic Fri Jul 30 22:37   still running
reboot   system boot  5.4.0-77-generic Fri Jul 30 22:05   still running
reboot   system boot  5.4.0-77-generic Fri Jul 30 21:42   still running
reboot   system boot  5.4.0-77-generic Fri Jul 30 21:24   still running
reboot   system boot  5.4.0-77-generic Fri Jul 30 20:53   still running
reboot   system boot  5.4.0-77-generic Fri Jul 30 20:42   still running

dmesg doesn't show anything useful related to the reboots. We've tailed /var/log/kern.log and syslog.log all day, but there's nothing added just before the reboots.

Thinking that it might be heat-related we did a 'watch -n 1 sensors' around the times when they are most likely to reboot, and although the CPU was "warm" it was still below the HIGH limit, and 20-30 degrees C lower than the CRITICAL limit which as I understand is where it would shutdown/reboot.

What can we try next to track down the cause of these reboots?

Thanks.

Recently Updated Ubuntu user avatar
cn flag
Maybe check your RAM. How much do you have? Try the command htop and see if you are running out of RAM.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.