
API-Server on master stops after adding second control-plane

In my current test setup I've several VMs running Debian-11. All nodes have a private IP and a second wireguard interface. In the future the nodes will be in different locations with different network and Wireguard is used to "overlay" all the different network environments. I want to install a Kubernetes on all nodes.

node   public ip        wireguard ip

So I've installed docker and kubeadm/kubelet/kubectl in version 1.23.5 on all nodes. Also I've installed a haproxy on all nodes too. It works as a load balancer by listing to localhost:443 and forwarding the requests to one of the online control-planes.

Then I started the cluster with kubeadm

vm01> kubeadm init --apiserver-advertise-address= --pod-network-cidr=

After that I tested to integrate either flannel or calico. Either by adding --iface=<wireguard-interface> or by setting the custom manifest ...nodeAddressAutodetectionV4.interface: <wireguard-interface>.

When I add a normal node - everything is fine. The node is added, pods are created and the communication is done via the defined network interface.

When I add a control plane without the wireguard interface, I can also add different control planes with

vm2> kubeadm join --token ... --discovery-token-ca-cert-hash sha256:...  --control-plane

Of course before that, I've copied several files from vm01 to vm02 from /etc/kubernetes/pki like the ca.*, sa.*, front-proxy-ca.*, apiserver-kubelet-client.* and etcd/ca.*.

But when I use the flannel or calico network together with the wireguard interface, something strange happens after the join command.

root@vm02:~# kubeadm join --token nwevkx.tzm37tb4qx3wg2jz --discovery-token-ca-cert-hash sha256:9a97a5846ad823647ccb1892971c5f0004043d88f62328d051a31ce8b697ad4a --control-plane
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost mimas] and IPs [ ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost mimas] and IPs [ ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local mimas] and IPs []
[certs] Using the existing "apiserver-kubelet-client" certificate and key
[certs] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Generating kubeconfig files
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "admin.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[check-etcd] Checking that the etcd cluster is healthy
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[etcd] Announced new etcd member joining to the existing etcd cluster
[etcd] Creating static Pod manifest for "etcd"
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
[kubelet-check] Initial timeout of 40s passed.
error execution phase control-plane-join/etcd: error creating local etcd static pod manifest file: timeout waiting for etcd cluster to be available
To see the stack trace of this error execute with --v=5 or higher

And after that timeout even on vm01 the API server stops working, I cannot run any kubeadm or kubectl commands anymore. The HTTPS service on 6443 is dead. But neither I understand why the API server on vm01 stops working when adding a second API server nor I can find a reason, whe the output is talking about the 192.168.... IPs, because the cluster should communicate only via the wireguard network.

After finding a similar problem in I think, this is also the solution here. When I add --apiserver-advertise-address=<this-wireguard-ip>, the output changes (no 192.168.. IP) and it joins. What I don't understand, why VM01 API server stops working.

Whatever the join command is doing under the hood, it needs to create a etcd service on the second control plane and that service must also run on the same IP then the flannel/calico network interface. In case of using the primary network interface this parameter is not necessary on the second/third control plane.


