Score:2

Server

Debugging Prometheus OOMkilled despite 6Gi limits

Liquid

4/14/24, 8:17 AM

I'm at the end of my patience with a prometheus setup leveraging kube-prometheus-stack 44.3.0 (latest being 45).

I have two environments, staging and prod. In staging, my prometheus runs smoothly. In prod it has started crashing with OOMKilled errors roughly every 4 minutes.

Things I already tried:

Increased the scrape interval from 30s to 300s
Identified heavy metrics and dropped them before ingestion [More on that later]
Enabled the web.enable-admin-api, to query tsdb and clean the tombstones
Deleted prometheusrules, having noticed that they tended to shorten the pod life until the next crash
Upped the resources (limits and requests) to the maximum available considering the nodes I'm using (memory limit currently at 6Gi; staging works with under 1Gi memory)
Reduced the number of targets to scrape (taking down e.g. etcd metrics)

Comparing TSDB status across staging and prod when prod is up, it doesn't show higher numbers - until it crashes:

By looking at TSDB statistics I noticed I used to have kube_replicasets metrics swarming prometheus. Another component in the cluster has created a high number of replicasets due to a bug, thus increasing the metrics. I deactivated those metrics from the ingestion completely:

  ...
  metricRelabelings:
  - regex: '(kube_replicaset_status_observed_generation|kube_replicaset_status_replicas|kube_replicaset_labels|kube_replicaset_created|kube_replicaset_annotations|kube_replicaset_status_ready_replicas|kube_replicaset_spec_replicas|kube_replicaset_owner|kube_replicaset_status_fully_labeled_replicas|kube_replicaset_metadata_generation)'
    action: drop
    sourceLabels: [__name__]

I verified that those replicasets metrics are no longer present in the prod prometheus.

TL;DR:

Prometheus in my K8S environment is OOMkilled continuously, making the tool nigh impossible to use. I need insight on how to find and isolate the cause of the issue. Right now the only reasonable culprit still seems to be kube-state-metrics (todo - I need to disable it to verify the idea).

Related questions I've already looked at:

567

2 + 3

resources

debugging

kubernetes

prometheus

amazon-eks

mdaniel

4/15/24, 12:39 AM

Hi Liquid welcome to S.F. While sniffing around I saw [this comment](https://github.com/prometheus-community/helm-charts/issues/989#issuecomment-909631798) where they had **30Gi** so your 6Gi may be table-stakes at this point. However, that issue also made it seem as though it wasn't _ongoing_ prom that was eating all that memory as much as _startup_; is that your experience, too? Have you already examined [the `prometheus_tsdb_head_series` metric](https://github.com/prometheus/prometheus/issues/5019#issuecomment-448943487)?

Liquid

4/17/24, 2:52 PM

@mdaniel, thanks for the tip. In my case it's quite a baby-sized cluster of a around 6 nodes. I've checked the head series travels and its around 60k. I was able to isolate the broken component - which was related to metrics shipped by kube_state_metrics. I'll post an answer as update in case it could be useful for someone else

markalex

4/17/24, 2:56 PM

If you have an answer, it is always better to post it. And make it an answer, not an update to question: that way it will be clearer for those who'll might stumble across similar problem.

Score:1

Server

Vladimir

4/28/24, 9:23 AM

Here are most likely reasons of prometheus eating memory:

Overwhelming number of timeseries. Considering background this is most plausable. In prometheus datapoints are taking not much memory compared to unique timeseries. I couldn't find link now but AFAIR one datapoint takes around 4 bytes while timeseries w/o any datapoints takes around 1Kb. So having timeseries even w/o any datapoints will take space and might take memory. You can rule out this reason by comparing number of timeseries in prod and stage: count({__name__=~".+"}). If there are significantly more timeseries in prod you'll have to figure out why and probably further reduce the number.
PromQL queries that load to much data into memory. If you have queries requesting long time period or huge amount of timeseries it could also be a reason since prometheus tries to load requested data into memory. Since you have OOM constantly reproducing you can test this assumption by blocking all queries to prometheus and see if it still hits OOM. It may be worth to look at query log too.
Not enough memory on node. It could be just that other containers consume memory on node and prometheus is killed because it has lower QoS. Just make sure prometheus falls into guaranteed QoS.

+ 0

Score:1

Server

Liquid

5/4/24, 9:26 AM

The cause of my issue was a broken keycloak deployment in the keycloak namespace. An old keycloak setup was creating an high number of replicasets (around 36000), which caused the high cardinality for the replicaset-related queries in Prom.

The issue was not in staging since staging didn't mirror that configuration completely.

I had already tried the following relabeling to kube-state-metrics, dropping the queries before ingestion:

   - regex: '(kube_replicaset_status_observed_generation|kube_replicaset_status_replicas|kube_replicaset_labels|kube_replicaset_created|kube_replicaset_annotations|kube_replicaset_status_ready_replicas|kube_replicaset_spec_replicas|kube_replicaset_owner|kube_replicaset_status_fully_labeled_replicas|kube_replicaset_metadata_generation)'
    action: drop
    sourceLabels: [__name__]

but it proved to be too conservative. After adding:

- regex: 'keycloak'
action: drop
sourceLabels: [namespace]

my instance became stable again.

+ 0

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: Debugging Prometheus OOMkilled despite 6Gi limits

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.