prometheus alert for Kubernetes misfiring

GI D

8/28/23, 4:02 PM

On a K8s cluster I have Prometheus Operator and AlertManager running.

I have this alert to catch incidents when a critical pod is down:

 - alert: KubernetesContainerMission-gslNotRunning
    expr: kube_pod_status_ready{condition="false", pod=~"mission-gsl.*"} == 1 OR on() vector(0)
    for: 5m
    labels:
      severity: warning
      environment: PRODUCTION_ENV
    annotations:
      summary: CUSTOM mission-gsl pod not running for more than 5min (instance {{ $labels.instance }})
      description: "mission-gsl pod not running for more than 5min"

This deployment gets automatically rolling-restarted on a schedule every hour, and stays down for maybe 30sec during the process.

I would expect the alert to not fire, since I'm stipulating a 5min down period, yet it does.

What am I missing?

0 + 0

alert

kubernetes

prometheus

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: prometheus alert for Kubernetes misfiring

TH: prometheus แจ้งเตือน Kubernetes ที่ทำงานผิดพลาด

RO: Alerta prometheus pentru rau Kubernetes

RU: оповещение prometheus о сбоях в работе Kubernetes

VI: cảnh báo prometheus cho Kubernetes misfire

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.