I currently use kube-prometheus-stack to monitor several kubernetes clusters. Each cluster has its own deployment of the kube-prometheus-stack, however, there is currently only one cluster (a) that has alertmanager enabled. Cluster (a) is also scraping all other clusters /federate endpoint to get some health metrics and make alerting based on those.
To eliminate a single point of failure in case cluster (a) dies, I want to have a second cluster (b) with alerting enabled that runs in high availability mode together with cluster (a).
What is the best method to achieve that?
Regarding Prometheus:
Make both (a) and (b) Prometheus exactly the same configuration besides maybe a label for identification. They should contain the same data and fire the same alerts to (a) and (b) alertmanagers.
Regarding Alertmanagers:
Make (a) and (b) Alertmanagers communicate to each other to deduplicate alerts. This can be achieved by setting
alertmanagerSpec:
additionalPeers: []
Regarding Grafana:
Is it even achievable to make Grafana highly available in such kind of deployment? I know from here that you can set up Grafana for HA by letting both instances use the same database but how to do that in my setup?
Would be happy if someone could provide feedback on this idea...