Score:0

Kubernetes: kubeadm join fails in private network

at flag

I'm trying to set up a HA Kubernetes cluster on Hetzner Cloud following this guide. I've created 6 servers, 3 control plane hosts and 3 workers. When trying to use kubeadm to join the second server to the cluster I get the following errors:

On k8s-server-1:

Jul 06 14:09:01 k8s-server-1 kubelet[8059]: E0706 14:09:01.430599    8059 controller.go:187] failed to update lease, error: rpc error: code = Unknown desc = context deadline exceeded
Jul 06 14:08:54 k8s-server-1 kubelet[8059]: E0706 14:08:54.370142    8059 controller.go:187] failed to update lease, error: rpc error: code = Unknown desc = context deadline exceeded
Jul 06 14:08:51 k8s-server-1 kubelet[8059]: E0706 14:08:51.762075    8059 kubelet_node_status.go:470] "Error updating node status, will retry" err="error getting node \"k8s-server-1\": Get \"https://my.kubernetes.test:6443/api/v1/nodes/k8s-server-1?resourceVersion=0&timeout=10s\": context deadline exceeded"
Jul 06 14:08:47 k8s-server-1 kubelet[8059]: E0706 14:08:47.325309    8059 event.go:273] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"kube-apiserver-k8s-server-1.168f32516b37209a", GenerateName:"", Namespace:"kube-system", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"kube-apiserver-k8s-server-1", UID:"10b8928a4f8e5e0b449a40ab35a3efdc", APIVersion:"v1", ResourceVersion:"", FieldPath:"spec.containers{kube-apiserver}"}, Reason:"Unhealthy", Message:"Readiness probe failed: HTTP probe failed with statuscode: 500", Source:v1.EventSource{Component:"kubelet", Host:"k8s-server-1"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xc0312fd0ee49429a, ext:115787424848, loc:(*time.Location)(0x74c3600)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xc0312fd16f1a0a1d, ext:117801107410, loc:(*time.Location)(0x74c3600)}}, Count:2, Type:"Warning", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Patch "https://my.kubernetes.test:6443/api/v1/namespaces/kube-system/events/kube-apiserver-k8s-server-1.168f32516b37209a": read tcp 192.168.178.2:60934->192.168.178.8:6443: use of closed network connection'(may retry after sleeping)
Jul 06 14:08:47 k8s-server-1 kubelet[8059]: E0706 14:08:47.324053    8059 controller.go:187] failed to update lease, error: rpc error: code = Unknown desc = context deadline exceeded
Jul 06 14:08:46 k8s-server-1 kubelet[8059]: I0706 14:08:46.986663    8059 status_manager.go:566] "Failed to get status for pod" podUID=10b8928a4f8e5e0b449a40ab35a3efdc pod="kube-system/kube-apiserver-k8s-server-1" error="etcdserver: request timed out"

On k8s-server-2:

Jul 06 14:09:04 k8s-server-2 kubelet[6685]: E0706 14:09:04.072247    6685 event.go:264] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"weave-net-9fldg.168f3252093de42e", GenerateName:"", Namespace:"kube-system", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"weave-net-9fldg", UID:"88743b7a-aa81-4948-be9b-78c4bbf436fe", APIVersion:"v1", ResourceVersion:"714", FieldPath:"spec.initContainers{weave-init}"}, Reason:"Pulled", Message:"Successfully pulled image \"docker.io/weaveworks/weave-kube:2.8.1\" in 6.525660057s", Source:v1.EventSource{Component:"kubelet", Host:"k8s-server-2"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xc0312fd1997fa82e, ext:11173601176, loc:(*time.Location)(0x74c3600)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xc0312fd1997fa82e, ext:11173601176, loc:(*time.Location)(0x74c3600)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'rpc error: code = Unknown desc = context deadline exceeded' (will not retry!)
Jul 06 14:08:57 k8s-server-2 kubelet[6685]: E0706 14:08:57.993540    6685 controller.go:144] failed to ensure lease exists, will retry in 400ms, error: Get "https://my.kubernetes.test:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/k8s-server-2?timeout=10s": context deadline exceeded
Jul 06 14:08:57 k8s-server-2 kubelet[6685]: I0706 14:08:57.352989    6685 scope.go:111] "RemoveContainer" containerID="9e05ad27088c41bdd02bd0d32a16706fc6eab6e458031f0714c9a56541f8f222"
Jul 06 14:08:56 k8s-server-2 kubelet[6685]: E0706 14:08:56.992481    6685 event.go:273] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"weave-net-9fldg.168f3252093de42e", GenerateName:"", Namespace:"kube-system", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"weave-net-9fldg", UID:"88743b7a-aa81-4948-be9b-78c4bbf436fe", APIVersion:"v1", ResourceVersion:"714", FieldPath:"spec.initContainers{weave-init}"}, Reason:"Pulled", Message:"Successfully pulled image \"docker.io/weaveworks/weave-kube:2.8.1\" in 6.525660057s", Source:v1.EventSource{Component:"kubelet", Host:"k8s-server-2"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xc0312fd1997fa82e, ext:11173601176, loc:(*time.Location)(0x74c3600)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xc0312fd1997fa82e, ext:11173601176, loc:(*time.Location)(0x74c3600)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'Post "https://my.kubernetes.test:6443/api/v1/namespaces/kube-system/events": read tcp 192.168.178.3:47722->192.168.178.8:6443: use of closed network connection'(may retry after sleeping)
Jul 06 14:08:56 k8s-server-2 kubelet[6685]: E0706 14:08:56.990109    6685 kubelet_node_status.go:470] "Error updating node status, will retry" err="error getting node \"k8s-server-2\": Get \"https://my.kubernetes.test:6443/api/v1/nodes/k8s-server-2?timeout=10s\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)"
Jul 06 14:08:56 k8s-server-2 kubelet[6685]: I0706 14:08:56.989160    6685 scope.go:111] "RemoveContainer" containerID="9e05ad27088c41bdd02bd0d32a16706fc6eab6e458031f0714c9a56541f8f222"
Jul 06 14:08:56 k8s-server-2 kubelet[6685]: E0706 14:08:56.988865    6685 kubelet.go:1683] "Failed creating a mirror pod for" err="Post \"https://my.kubernetes.test:6443/api/v1/namespaces/kube-system/pods\": read tcp 192.168.178.3:47722->192.168.178.8:6443: use of closed network connection" pod="kube-system/etcd-k8s-server-2"
Jul 06 14:08:54 k8s-server-2 kubelet[6685]: E0706 14:08:54.210098    6685 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"etcd\" with CrashLoopBackOff: \"back-off 10s restarting failed container=etcd pod=etcd-k8s-server-2_kube-system(22b3a914daf1bef98cb01ddd7868523d)\"" pod="kube-system/etcd-k8s-server-2" podUID=22b3a914daf1bef98cb01ddd7868523d
Jul 06 14:08:54 k8s-server-2 kubelet[6685]: I0706 14:08:54.208472    6685 scope.go:111] "RemoveContainer" containerID="9e05ad27088c41bdd02bd0d32a16706fc6eab6e458031f0714c9a56541f8f222"
Jul 06 14:08:54 k8s-server-2 kubelet[6685]: E0706 14:08:54.208199    6685 kubelet.go:1683] "Failed creating a mirror pod for" err="rpc error: code = Unknown desc = context deadline exceeded" pod="kube-system/etcd-k8s-server-2"
Jul 06 14:08:53 k8s-server-2 kubelet[6685]: E0706 14:08:53.347043    6685 event.go:264] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"kube-proxy-2z5js.168f3250c7fc2120", GenerateName:"", Namespace:"kube-system", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ClusterName:"", ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"kube-proxy-2z5js", UID:"0ac8fe5d-7332-4a4d-abee-48c6d4dee38f", APIVersion:"v1", ResourceVersion:"711", FieldPath:"spec.containers{kube-proxy}"}, Reason:"Started", Message:"Started container kube-proxy", Source:v1.EventSource{Component:"kubelet", Host:"k8s-server-2"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xc0312fd04243d720, ext:5783805064, loc:(*time.Location)(0x74c3600)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xc0312fd04243d720, ext:5783805064, loc:(*time.Location)(0x74c3600)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'rpc error: code = Unknown desc = context deadline exceeded' (will not retry!)
Jul 06 14:08:53 k8s-server-2 kubelet[6685]: I0706 14:08:53.269542    6685 scope.go:111] "RemoveContainer" containerID="e2664d16d53ff5ae6de27fe52e84651791bca1ca70a6987c9a4e3e7318eaa174"
Jul 06 14:08:47 k8s-server-2 kubelet[6685]: I0706 14:08:47.194425    6685 scope.go:111] "RemoveContainer" containerID="7aaa63419740b5e30cc76770abc92dfbabe1f48d4d812b4abc89168f73e46d51"
Jul 06 14:08:46 k8s-server-2 kubelet[6685]: I0706 14:08:46.987598    6685 status_manager.go:566] "Failed to get status for pod" podUID=778e041efc75c1983cbb59f2b3d46d09 pod="kube-system/kube-controller-manager-k8s-server-2" error="etcdserver: request timed out"
Jul 06 14:08:46 k8s-server-2 kubelet[6685]: E0706 14:08:46.986807    6685 controller.go:144] failed to ensure lease exists, will retry in 200ms, error: etcdserver: request timed out
Jul 06 14:08:46 k8s-server-2 kubelet[6685]: E0706 14:08:46.986800    6685 kubelet_node_status.go:470] "Error updating node status, will retry" err="error getting node \"k8s-server-2\": etcdserver: request timed out"

Server list: | Name | Public IP | Private IP | | --- | --- | --- | | k8s-server-1 | 192.168.178.2 | 10.23.1.2 | | k8s-server-2 | 192.168.178.3 | 10.23.1.3 | | k8s-server-3 | 192.168.178.4 | 10.23.1.4 | | k8s-worker-1 | 192.168.178.5 | 10.23.1.5 | | k8s-worker-2 | 192.168.178.6 | 10.23.1.6 | | k8s-worker-3 | 192.168.178.7 | 10.23.1.7 |

Additionally, k8s-server-* have the following firewall rules applied to them (only applies to traffic routed via public IP, not inside the private network): | Direction | Port | Source/Destination | | --- | --- | --- | | Ingress | 80 | any | | Ingress | 443 | any | | Ingress | 22 | static company IP | | Ingress | 6443 | static company IP | | Egress | any | any |

There is a load balancer inside the same network which routes traffic to k8s-server-1. It's public IP is 192.168.178.8 and the private IP is 10.23.1.8.

What I ran on both nodes:

apt-get update
apt-get install     apt-transport-https     ca-certificates     curl     gnupg     lsb-release
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo   "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
  $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
apt-get update
apt-get install docker-ce docker-ce-cli containerd.io
systemctl enable docker.service
systemctl enable containerd.service
cat <<EOF | sudo tee /etc/docker/daemon.json
{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
  "storage-driver": "overlay2"
}
EOF

systemctl enable docker
systemctl daemon-reload
systemctl restart docker

cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF

cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF

sysctl --system

apt-get update
apt-get install -y apt-transport-https ca-certificates curl
curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
apt-get update
apt-get install -y kubelet kubeadm kubectl
apt-mark hold kubelet kubeadm kubectl

... on server 1:

kubeadm config images pull
kubeadm init --apiserver-advertise-address=10.23.1.2 --control-plane-endpoint "my.kubernetes.test:6443" --upload-certs

mkdir ~/.kube
cp /etc/kubernetes/admin.conf ~/.kube/config

kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
watch kubectl get pod -n kube-system
watch kubectl get nodes

... on server 2:

kubeadm config images pull
kubeadm join my.kubernetes.test:6443 --token XXXXX.XXXXX --discovery-token-ca-cert-hash sha256:XXXXXXXXXX --control-plane --certificate-key XXXXXXXXXX
Dawid Kruk avatar
cn flag
Do your `VM`'s have multiple interfaces (public, private) or it's using a 1:1 NAT? Are there any firewall rules for your private network? Also, for weave to work you'd need to allow the [TCP 6783 and UDP 6783/6784 ports](https://www.weave.works/docs/net/latest/faq/#ports).
mway-niels avatar
at flag
Yes, each VM has 3 interfaces by default: ens10 (private), eth0 (public) and lo (loopback). VMs in the same network can communicate freely on all ports (when using the private IP), regardless of external firewall settings. I'm running Ubuntu, ufw is inactive. Thanks for the hint regarding Weave. Does it try to communicate over the public network? If not the ports should already be allowed.
Score:1
at flag

I was able to resolve the issue by adding the --apiserver-advertise-address parameter to the kubeadm join command as well.

cn flag
I assume you added `--apiserver-advertise-address=10.23.1.2` (the IP of the first server) ?
mway-niels avatar
at flag
I used the private IP of the respective master node I'm trying to add to the cluster.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.