I have a 3 control plane node HA kubeadm cluster that I need to completely replace. I have already replaced the worker nodes. How do I completely replace the control plane without downtime?
I use a LB in front for the api endpoint. This is a production env so I want to be sure I get it right. Failure is not an option.
My initial plan of attack is this:
- add 3 new nodes with the following command on the orginal control plane node 1
sudo kubeadm token create --print-join-command --certificate-key $(kubeadm certs certificate-key)
. Take the output and apply it to all 3 new hosts.
- wait till everyone is stable and all 6 cp nodes are ready.
- drain the first old node
- kubectl delete node
- ssh into host of same node, run kubeadm reset.
- wait till everything is stable and all 5 nodes are ready.
- rinse & repeat 2 more times on the remaining old nodes.
Here is where I get scared, I know you can bonk your cluster if your not careful with etcd. During the original install we use kubeadm init on the first cp node.(cp1). Then we use join tokens for the rest. Does this make cp1 a special cp node and can it even be replaced? I have looked far and wide for answers and not finding anything really convincing or authoritative.
Thank you for taking a look. Once I get this right, I will be offering to the kubernetes.io group as a documentation submission.
ENV:
kubectl, kubeadm, kubelet: 1.26.0
cilium cni: 1.12.5
ubuntu 20.04
containerd: 1.6.17