Score:0

API-Server on master stops after adding second control-plane

us flag
TRW

In my current test setup I've several VMs running Debian-11. All nodes have a private IP and a second wireguard interface. In the future the nodes will be in different locations with different network and Wireguard is used to "overlay" all the different network environments. I want to install a Kubernetes on all nodes.

node   public ip        wireguard ip
vm1    192.168.10.10    10.11.12.10
vm2    192.168.10.11    10.11.12.11
vm3    192.168.10.12    10.11.12.12
...

So I've installed docker and kubeadm/kubelet/kubectl in version 1.23.5 on all nodes. Also I've installed a haproxy on all nodes too. It works as a load balancer by listing to localhost:443 and forwarding the requests to one of the online control-planes.

Then I started the cluster with kubeadm

vm01> kubeadm init --apiserver-advertise-address=10.11.12.10 --pod-network-cidr=10.20.0.0/16

After that I tested to integrate either flannel or calico. Either by adding --iface=<wireguard-interface> or by setting the custom manifest ...nodeAddressAutodetectionV4.interface: <wireguard-interface>.

When I add a normal node - everything is fine. The node is added, pods are created and the communication is done via the defined network interface.

When I add a control plane without the wireguard interface, I can also add different control planes with

vm2> kubeadm join 127.0.0.1:443 --token ... --discovery-token-ca-cert-hash sha256:...  --control-plane

Of course before that, I've copied several files from vm01 to vm02 from /etc/kubernetes/pki like the ca.*, sa.*, front-proxy-ca.*, apiserver-kubelet-client.* and etcd/ca.*.

But when I use the flannel or calico network together with the wireguard interface, something strange happens after the join command.

root@vm02:~# kubeadm join 127.0.0.1:443 --token nwevkx.tzm37tb4qx3wg2jz --discovery-token-ca-cert-hash sha256:9a97a5846ad823647ccb1892971c5f0004043d88f62328d051a31ce8b697ad4a --control-plane
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost mimas] and IPs [192.168.10.11 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost mimas] and IPs [192.168.10.11 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local mimas] and IPs [10.96.0.1 192.168.10.11 127.0.0.1]
[certs] Using the existing "apiserver-kubelet-client" certificate and key
[certs] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Generating kubeconfig files
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "admin.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[check-etcd] Checking that the etcd cluster is healthy
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[etcd] Announced new etcd member joining to the existing etcd cluster
[etcd] Creating static Pod manifest for "etcd"
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
[kubelet-check] Initial timeout of 40s passed.
error execution phase control-plane-join/etcd: error creating local etcd static pod manifest file: timeout waiting for etcd cluster to be available
To see the stack trace of this error execute with --v=5 or higher

And after that timeout even on vm01 the API server stops working, I cannot run any kubeadm or kubectl commands anymore. The HTTPS service on 6443 is dead. But neither I understand why the API server on vm01 stops working when adding a second API server nor I can find a reason, whe the output is talking about the 192.168.... IPs, because the cluster should communicate only via the 10.11.12.0/24 wireguard network.

Score:0
us flag
TRW

After finding a similar problem in https://stackoverflow.com/questions/64227042/setting-up-a-kubernetes-master-on-a-different-ip I think, this is also the solution here. When I add --apiserver-advertise-address=<this-wireguard-ip>, the output changes (no 192.168.. IP) and it joins. What I don't understand, why VM01 API server stops working.

Whatever the join command is doing under the hood, it needs to create a etcd service on the second control plane and that service must also run on the same IP then the flannel/calico network interface. In case of using the primary network interface this parameter is not necessary on the second/third control plane.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.