We've some servers in linux and those servers get hang(stuck) but not stopped. So, how can I deal with those servers. It's not clear what's the cause of this stuckness. Any guidance will be appreciated.
The problems:
- The server hangs time to time. It doesn't get stopped. It just hangs. Theoritically it's still up but practically it has stopped working. The one way to trace it is to monitor the logs, you'd see logs not being printed anymore.
Cause: Unknown
- The server goes down time to time, too frequently on some servers.
Cause: Huge log size
Solution: logrotate
- The server goes down time to time, too frequently on some servers.
Cause: Unknown
Solution: Script that auto-restarts the service in timely manner. I've less hopes that it will work though.
- The clients want to be able to monitor these services by themselves and do things like restarting by themselves. What's the best monitoring tool that allows to restart the service as well(i.e something that runs scripts as I like)?
Are nagios, zabbix, monit used for this purpose? what's the best tool for this purpose?
We're using centos 7 (Yes it's reaching end of life). The servers are on virtual machine. We only have remote access. The applications are:
java servers
glassfish servers
tomcat servers