Score:0

Linux server nrpe nagios check_procs process state monitoring - need info, when process is restarted

mx flag

I need to be informed by nagios when a process on a remote server is restarted.

The only thing I do not know how to do is to check its process state, and what way to do it?

I got in remote server this nrpe command for now: ./check_procs -c 1: -a "/usr/local/yyyprogram/sbin/XXXdaemon" -s Sl but this process must work all the time, has own mechanism to restart, and this is the only thing I need to know - when exactly it restarts. What state of process should I add here, and what way - example -s SlRD is ok? or -s Sl -s R -s D is ok? Maybe I can do it other way to have this kind of information: OK|WARNING|UNKNOWN|CRITICAL ? The only status OK for me is OK (means working).

Also, how to monitor it from other nagios server, should I check this every one second? When service restarts, I can be notified one or two minutes later, but how to know it happened without checking logs? PID of this service after restart mechanism is different from before.

How can I be sure that all status is included in line of nrpe command config?

Please help:)

EDIT

root@server:/usr/local/nagios/libexec# ./check_procs -vv -a "/usr/local/yyyprogram/sbin/xxxdaemon"
CMD: /usr/bin/ps axwwo 'stat uid pid ppid vsz rss pcpu cgroup:256 comm args'
Matched: uid=0 vsz=9412 rss=2804 pid=517515 ppid=1 jid=0 pcpu=0.20 stat=Sl etime= prog=xxxdaemon args=/usr/local/yyyprogram/sbin/xxxdaemon -d /usr/local/yyyprogram/conf -b
 cgroup_hierarchy=(null)
Score:0
cn flag

First and foremost, if you are interested in how long a process has been running, check_procs does not offer that functionality as far as I can see from the -h flag, so I'm not sure why you are assuming it does. Or is that not what you're trying to check?

If you want to check how long a process has been running for, you don't need a plugin for it. This example grabs the PID of netdata, gives you etimes, greps to only show the number and uses xargs to remove extra spaces around the number:

$ ps -p $(pidof /usr/sbin/netdata) -o etimes | grep -E "[1-9].*" | xargs
65805

$ systemctl restart netdata

$ ps -p $(pidof /usr/sbin/netdata) -o etimes | grep -E "[1-9].*" | xargs
10

All you have to do is write a shell script that checks if the value is below a certain number, if there's a problem exit 1, then run that script over NRPE from Nagios.

Kamil Bu avatar
mx flag
No, i do not need information how long it was runnig. I need to know, and be informed by nagios that it has restarted. And i do not know what flags should i monitor and what way to get this information.
pzkpfw avatar
cn flag
checking the etimes would tell you if it has restarted, and I just told you how to check it. In what way does this not answer your question? What have you tried so far?
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.