I am running a private Server using Ubuntu Server 22.04. It's an old PC that I repurposed to a Homeserver which runs various applications via docker.
For almost 2 months now it randomly stops responding to any network requests. When it does, the fans and power LEDs such as PCIe and RAM indicators stay on but it otherwise acts as if you shut it down. The only way to get it back up is a force reset. There are never any logs about a crash, journalctl --system
shows the log right until the latest crash and then just abruptly stops, same for journalctl -k
. Also there isn't any pattern in crash times, it's just as random as the crash itself.
Since 1.5 weeks now the uptime before a crash has suddenly reduced to around 3-6 hours, before it would stay up for about a week.
At first I though it was my OS so I switched from my previous Debian headless to Ubuntu Server 22.04, it still crashed. I have already checked the PSU with a multimeter, every Voltage before and after a crash is where it should be so I ruled that out. Then I checked my RAM with memtest and sure enough, it had a bunch of errors so i swapped it with a spare I had lying around and checked that: No errors. It still crashed two times after I swapped it. After that I changed my SSD, used clonezilla to move the installation. Today I woke up and it had crashed again.
I don't really know what else to test now, I have a spare PSU which I could swap in and test that, anything other than that (CPU/Motherboard) would have to be ordered to rule it out.
What bugs me most is the absence of any logs, the fact that it just stops logging at (for example) 3:35 AM and doesnt respond. While checking the container logs I saw that nginx and some other containers would run a bit longer than that, nginx served requests until 3:53 AM before everything died. There are never any crash logs in docker, it just acts like you pulled the plug but the fans and power LEDs stay on.
EDIT:
Forgot the system Specs:
- CPU - Ryzen 5 1600X
- GPU - GTX 1080
- RAM - G.Skill Aegis 16GB DDR4-2666 (before), G.Skill Fortis 32GB DDR4-2400 (now)
- NVMe SSD - Patriot P300 256GB (before), Samsung 960 Evo 250GB (now)
- PSU - 550W plugged into a 300VA UPS