Score:0

Grafana false positive SNMP down

jp flag

NOTE: I also have Nagios running on another server that reports bandwidth warnings and up/down status. Not a single switch is alerting from this, only Grafana.

Grafana version 1.14.1

I was receiving alerts every minute of all switches reporting as down.

grafana_false_positive

The metrics portion of the dashboard is:

up{instance="192.168.20.20",job="snmp"} <--- same for all 12 switches that are polled

I was able to log in to the switch during these reported "outages." No other services were showing interruption (e.g. servers connected to those switches). I have yet to see something like this, and I'm trying to figure out how I can troubleshoot. If there is not actually a problem, what would cause this false positive?

Grafana runs in a Docker container, and I cannot seem to find anything in /var/log/grafana/grafana.log.* related to switches.

Any ideas on where I could glean some info to debug this?

Score:0
br flag

Grafana is just a visualization tool. And as you can see, It is doing that job very well.

Two things :

  1. It can be your data source's problem. Check if there's actual data is.

  2. If you are crawling metrics by using script/daemon, then check that too.

DevOpsSauce avatar
jp flag
It's not doing its job very well if the switch is up, but it's reporting down. I have no scripts crawling metrics.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.