We're planning our new Kubernetes cluster infrastructure and I have some questions.
Currently, we have one larger cluster where environments (dev, staging, prod) and multiple teams are working on. In the beginning, it was just a "POC", a demo - but guys you know: nothing lasts longer than temporary solutions.
On this setup, we have some general issues and on our destination architecture, we plan to fix some of those topics.
I hope that some of you can share knowledge/experience.
First of all: one cluster per application is not a solution. The applications are really small and every team has around 3-5 applications and needs about 6-20GB of ram over all nodes per environment. So a single cluster is not really an option.
We plan one cluster per environment: dev, staging (qa), prod, and maybe for operations a demo cluster.
Everything is and will be automated and IaC with terraform + ansible (kubespray).
Every team/application scope will get a single namespace - of cause.
Our questions / problems:
Monitoring
Normally we use Prometheus and Grafana to monitor pod/cluster resource usage. New should also contain central logging (we're trying out solutions right now).
This is fine for the infra-team, but infra doesn't want to monitor on application level.
Is there any working way to provide the app-teams a monitoring? Like: you (the app-team) can setup alerts on logs, cpu, ram usage whatever you need. "You just need to roll out this helm chart".
In a great world, I would provide every team (so every namespace) it's own monitoring stack so we're also able to limit storage and ram+cpu usage and every team is able to use the "ordered" resources (so if the team has a lot of logs / monitoring needs, it need to "order" more resources").
Also based on that approach, they can choose the software that suits best.
Another solution could be that infra team setups a central monitoring / log solution and limit the access. App-Team A should not be able to access logs / cpu usage / ram usage / disk usage from App-Team B. But I can't see any way to do that really good.
It can be an option that infra team installs that stack - but everything I saw is: when I install a monitoring stack on a specific namespace, the stack needs admin access to the cluster. This is not nice in my opinion.
Am I wrong?
Storage
We have a gluster storage and want to keep it. If a team needs a disk, we add a "glusterfs persistent volume" with a specific size and storageClassName like "team1-disk5".
Based on that, the team can create a PVC and use the storage. Works fine in the past.
Is this a good solution? Any other ideas?
I think that's all for the moment. Just those two questions. Any idea to move me in the correct direction?
Thanks!