Score:0

Unexpected server crashes

ao flag

My ubuntu server running Ubuntu Desktop (Ubuntu 22.04.2 LTS (GNU/Linux 5.19.0-38-generic x86_64) has started crashing in the last 24 hours unexpectedly.

This happens seemingly at random. During this time, the fans on the server spin at 100% and SSH access is locked out. After a hardware power reset, I can gain access again and the server appear to operate normally. The same thing occured 12 hours later. As I have now seen 2 restarts in 24 I cant attribute this to an isolated event.

Things I've checked:

  1. there is 91GB free left on the SSD running the OS so I dont think its diskspace related
  2. Recent changes? I may have done an update/upgrade install recently, but I've done this many times without issue
  3. May be random hardware issue, but hasnt received any knocks,drops. Appreciate hardware can still just fail
  4. dmesg shows some CIFS errors. However, I'd very surprised if this caused it to hang/crash. I've not made any recent fstab changes and dont see why this could cause the server to lock up so significantly.

How do I investigate this please? Is this likely to be hardware or software?

I'm reluctant to re-install Ubuntu as I have a lot of setup/config on there that would take a long time to replace. Yes - I should get my config and deployments backed up and will do that when I can...

Thank you for your time

kern.log (before crash)

Apr  9 20:26:13 cruz-NUC8i5BEH kernel: [43628.769841] perf: interrupt took too long (2518 > 2500), lowering kernel.perf_event_max_sample_rate to 79250
Apr  9 20:31:36 cruz-NUC8i5BEH kernel: [43952.000935] perf: interrupt took too long (3183 > 3147), lowering kernel.perf_event_max_sample_rate to 62750
Apr  9 20:41:37 cruz-NUC8i5BEH kernel: [44552.665406] perf: interrupt took too long (3984 > 3978), lowering kernel.perf_event_max_sample_rate to 50000

syslog (last line before crash)

Apr  9 20:44:20 cruz-NUC8i5BEH jellyfin[1238]: [20:44:20] [INF] FFmpeg exited with code 0

auth.log (restart happened about 30s before 20:59)

Apr  9 20:30:01 cruz-NUC8i5BEH CRON[128833]: pam_unix(cron:session): session closed for user root
Apr  9 20:59:21 cruz-NUC8i5BEH sshd[793]: Server listening on 0.0.0.0 port 22.
Apr  9 20:59:21 cruz-NUC8i5BEH sshd[793]: Server listening on :: port 22.

dmesg - nothing shows here until after the reboot so of little value

waltinator avatar
it flag
Use `sudo journalctl -b -1 -e` to see the logs leading up to the crash.
waltinator avatar
it flag
Check your computer's fan, filters, airflow. CPU overheating can result in an instant OFF, with no log entries. How dusty is your computer? Dust is a very good insulator, and it keeps the heat in the chip.
in flag
91GB of free space is not a lot if the storage device containing is 2TB or more. By default, 5% is set aside for use by the system only. This can result in applications being told they have zero space and entering into race conditions
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.