
Kubernetes Shared Cluster

vn flag

We're planning our new Kubernetes cluster infrastructure and I have some questions. Currently, we have one larger cluster where environments (dev, staging, prod) and multiple teams are working on. In the beginning, it was just a "POC", a demo - but guys you know: nothing lasts longer than temporary solutions. On this setup, we have some general issues and on our destination architecture, we plan to fix some of those topics.

I hope that some of you can share knowledge/experience.

First of all: one cluster per application is not a solution. The applications are really small and every team has around 3-5 applications and needs about 6-20GB of ram over all nodes per environment. So a single cluster is not really an option.

We plan one cluster per environment: dev, staging (qa), prod, and maybe for operations a demo cluster. Everything is and will be automated and IaC with terraform + ansible (kubespray). Every team/application scope will get a single namespace - of cause.

Our questions / problems:

Monitoring Normally we use Prometheus and Grafana to monitor pod/cluster resource usage. New should also contain central logging (we're trying out solutions right now). This is fine for the infra-team, but infra doesn't want to monitor on application level.

Is there any working way to provide the app-teams a monitoring? Like: you (the app-team) can setup alerts on logs, cpu, ram usage whatever you need. "You just need to roll out this helm chart". In a great world, I would provide every team (so every namespace) it's own monitoring stack so we're also able to limit storage and ram+cpu usage and every team is able to use the "ordered" resources (so if the team has a lot of logs / monitoring needs, it need to "order" more resources"). Also based on that approach, they can choose the software that suits best.

Another solution could be that infra team setups a central monitoring / log solution and limit the access. App-Team A should not be able to access logs / cpu usage / ram usage / disk usage from App-Team B. But I can't see any way to do that really good.

It can be an option that infra team installs that stack - but everything I saw is: when I install a monitoring stack on a specific namespace, the stack needs admin access to the cluster. This is not nice in my opinion.

Am I wrong?

Storage We have a gluster storage and want to keep it. If a team needs a disk, we add a "glusterfs persistent volume" with a specific size and storageClassName like "team1-disk5". Based on that, the team can create a PVC and use the storage. Works fine in the past.

Is this a good solution? Any other ideas?

I think that's all for the moment. Just those two questions. Any idea to move me in the correct direction?



Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.