I have a Ubuntu server running remotely in another office. It has gone dead few times and I can't figure out the cause. It's a server that requests external service via api. By dead
I mean it is still running but just stop working. The server's network seems to be offline too and a lan scan doesn't find it.
It's behind an office router and running 18.04 kernel 4.15.0-147-generic. No one onsite has account on this server.
Here is what I have tried.
last reboot
result:
reboot system boot 4.15.0-151-gener Thu Jul 22 14:49 still running
reboot system boot 4.15.0-147-gener Wed Jul 21 15:48 still running
reboot system boot 4.15.0-147-gener Wed Jul 21 14:05 - 15:48 (01:43)
reboot system boot 4.15.0-147-gener Sat Jul 17 18:24 - 15:48 (3+21:24)
reboot system boot 4.15.0-147-gener Thu Jul 15 17:26 - 15:48 (5+22:22)
Jul 22 14:49
was a reboot that I asked staff onsite did. There were power outage on Jul 21.
- /var/log/syslog
Jul 22 09:08:50 localhost service_start.sh[946]: INFO:launcher:myjob finish a output for 2.
Jul 22 09:08:50 localhost service_start.sh[946]: INFO:launcJul 22 14:50:05 localhost systemd[1]: Starting Flush Journal to Persistent Storage...
Jul 22 14:50:05 localhost systemd[1]: Started LVM2 metadata daemon.
Jul 22 14:50:05 localhost systemd[1]: Started Load/Save Random Seed.
Jul 22 14:50:05 localhost lvm[443]: 2 logical volume(s) in volume group "localhost-vg" monitored
Jul 22 14:50:05 localhost systemd[1]: Started Set the console keyboard layout.
Jul 22 14:50:05 localhost systemd-modules-load[436]: Inserted module 'iscsi_tcp'
The system went offline after Jul 22 09:08:50
. Jul 22 14:50:05
was the reboot mentioned before.
Looks like the system was not reboot or shutdown otherwise there should be some log indicated that. And there is no system error log in syslog either.
There are two user cron jobs setup to run every 5 and 10 mintues and there were cron running entries in syslog around Jul 22 09:05:01
before the system became dead around Jul 22 09:08:50
.
There are no technical people onsite and I can only reach the server via teamview from another onsite computer at the moment.
I had run htop and the system load was light.
I am at loss right now. What else should I check during my next teamview session?