My ubuntu server (20.04.3 LTS, headless) freezes almost twice a week.
The first time, the last message in journalctl was
systemd: Finished Message of the Day.
and the second time, one from jellyfin. The weird thing is (I think), the computer isn't really crashing. It freezes and the fans keep spinning. To get it working again, I always have to turn it off by holding down the power button and then turn it back on. Maybe that's obvious, but I have also seen a lot of post where the user was still able to ssh into the machine. This isn't the case. Even though, the light on my LAN switch is blinking (on) the router/modem says it's offline.
I'm using
- CPU: Intel Core i5-6500 (4)
- Ram: 16GB with Swap: 4GB
Does anyone know how to troubleshoot this accordingly?
EDIT:
Output of: $ journaclctl -b -1 -e
Pastebin.com
EDIT 2:
Should be the dmesg log file of the crash: Pastebin.com
EDIT 3:
Ok, so I was able to evade any freezes over the past days by just rebooting once a day. This proves it's not some kind of thermal issue. Any further suggestions?
EDIT 4:
Thank you @thomas-ward for the tip. I have applied the recommended changes from this command:
https://askubuntu.com/a/45009/1028839.
Unfortunately, I couldn't find the old log from the freeze-up anymore. I will reboot once more to make the changes take effect and then wait for the next freeze-up.
EDIT 5:
Uptime: 6 days, 3 hours, 53 mins
Looks like that was the fix. Thank you very much @thomas-ward. If you'd like to submit the answer, I can tag it as the solution.
EDIT 6:
Yikes. Just crashed again after doing a captcha in my jdownloader docker. Although it didn't get resolved fully @thomas-ward's recommendation did improve the situation a lot.
EDIT 7:
After a freeze during reboot, I have decided to blame these problems on hardware. I replaced the harddrive with an ssd, switched to debian (unrelated) and updated the bios. Will add coming information about this in the future. Could take some time though.
EDIT 8:
Uptime: 9 days, 13 hours, 37 mins
I think it's pretty save to say the problem is fixed now.
Notes for users with similar issues: try to 'disable' 5% of your RAM as explained in EDIT 4. If that doesn't work, try switching SATA ports on your mainboard or get a new SATA cable. In my case, logs were impossible to show up due to the HDD straight up failing/disconnecting while the computer is running.