I've created a new cluster on 10.0.0.100 and after a few tweaks managed to get all pods up and running:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-6d4b75cb6d-fwjnr 1/1 Running 0 47s 10.244.0.4 ts-k8s-master <none> <none>
coredns-6d4b75cb6d-l6hs2 1/1 Running 0 41s 10.244.0.5 ts-k8s-master <none> <none>
etcd-ts-k8s-master 1/1 Running 76 (17h ago) 22h 10.0.0.100 ts-k8s-master <none> <none>
kube-apiserver-ts-k8s-master 1/1 Running 70 (17h ago) 18h 10.0.0.100 ts-k8s-master <none> <none>
kube-controller-manager-ts-k8s-master 1/1 Running 79 (15h ago) 22h 10.0.0.100 ts-k8s-master <none> <none>
kube-proxy-zmzdr 1/1 Running 1 (37m ago) 21h 10.0.0.100 ts-k8s-master <none> <none>
kube-scheduler-ts-k8s-master 1/1 Running 81 (17h ago) 22h 10.0.0.100 ts-k8s-master <none> <none>
So i'm now ready to Join nodes to the cluster (IPs 10.0.0.101, 10.0.0.102 etc.) but receive the following:
sudo kubeadm join 10.0.0.100:6443 --token l18xdm.eemusxu5rqf22gmx --discovery-token-ca-cert-hash sha256:e6451ec2e9ef26ddb1f2675e6dd7332e3d239db278516b567c7d9a33e6403ec9
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
error execution phase kubelet-start: timed out waiting for the condition
So I check the kubetlet
as understand that this is really what is going to communicate to the cluster that the node is attempting to join and perform the bootstrapping. Looks like kubelet
is broken on the node:
systemctl status kubelet
* kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
`-10-kubeadm.conf
Active: activating (auto-restart) (Result: exit-code) since Thu 2023-07-27 10:00:16 UTC; 757ms ago
Docs: https://kubernetes.io/docs/home/
Process: 191475 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
Main PID: 191475 (code=exited, status=1/FAILURE)
CPU: 202ms
As a comparison, the kubelet
on the Control Plane seems fine and produces the following:
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Wed 2023-07-26 15:55:15 BST; 24h ago
Docs: https://kubernetes.io/docs/home/
Main PID: 64452 (kubelet)
Tasks: 19 (limit: 2081)
Memory: 129.2M
CPU: 2h 6min 33.994s
CGroup: /system.slice/kubelet.service
└─64452 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime=remote --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --pod-infra-container-image=registry.k8s.io/pause:3.7
I've updated the control plane and node firewall with port 6443 for the master and 10248 (not sure if that was required)
I've got CGroup setup up correctly, I believe and containerd running:
* containerd.service - containerd container runtime
Loaded: loaded (/lib/systemd/system/containerd.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2023-07-26 13:23:16 UTC; 20h ago
Docs: https://containerd.io
Main PID: 133048 (containerd)
Tasks: 10
Memory: 19.8M
CPU: 4min 9.603s
CGroup: /system.slice/containerd.service
`-133048 /usr/bin/containerd
Not much seems to be obviously misconfigured (for me, at least) in the node's kublet config YAML but the finger seems to be pointing to the node's kubelet having an issue and thus can't bootstrap and therefore the node can't join the cluster:
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
anonymous:
enabled: false
webhook:
cacheTTL: 0s
enabled: true
x509:
clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
mode: Webhook
webhook:
cacheAuthorizedTTL: 0s
cacheUnauthorizedTTL: 0s
cgroupDriver: systemd
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
cpuManagerReconcilePeriod: 0s
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
logging:
flushFrequency: 0
options:
json:
infoBufferSize: "0"
verbosity: 0
memorySwap: {}
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
resolvConf: /run/systemd/resolve/resolv.conf
rotateCertificates: true
runtimeRequestTimeout: 0s
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s
I'm at a bit of a loss now how to proceed. Any help greatly appreciated.