Unable to join nodes to kubernetes cluster; kubelet error preventing kubeadm join suspicion

Question

Score:0

Server

Unable to join nodes to kubernetes cluster; kubelet error preventing kubeadm join suspicion

Daemon Jester

7/27/24, 12:29 PM

I've created a new cluster on 10.0.0.100 and after a few tweaks managed to get all pods up and running:

NAME                                    READY   STATUS    RESTARTS       AGE   IP           NODE            NOMINATED NODE   READINESS GATES
coredns-6d4b75cb6d-fwjnr                1/1     Running   0              47s   10.244.0.4   ts-k8s-master   <none>           <none>
coredns-6d4b75cb6d-l6hs2                1/1     Running   0              41s   10.244.0.5   ts-k8s-master   <none>           <none>
etcd-ts-k8s-master                      1/1     Running   76 (17h ago)   22h   10.0.0.100   ts-k8s-master   <none>           <none>
kube-apiserver-ts-k8s-master            1/1     Running   70 (17h ago)   18h   10.0.0.100   ts-k8s-master   <none>           <none>
kube-controller-manager-ts-k8s-master   1/1     Running   79 (15h ago)   22h   10.0.0.100   ts-k8s-master   <none>           <none>
kube-proxy-zmzdr                        1/1     Running   1 (37m ago)    21h   10.0.0.100   ts-k8s-master   <none>           <none>
kube-scheduler-ts-k8s-master            1/1     Running   81 (17h ago)   22h   10.0.0.100   ts-k8s-master   <none>           <none>

So i'm now ready to Join nodes to the cluster (IPs 10.0.0.101, 10.0.0.102 etc.) but receive the following:

sudo kubeadm join 10.0.0.100:6443 --token l18xdm.eemusxu5rqf22gmx --discovery-token-ca-cert-hash sha256:e6451ec2e9ef26ddb1f2675e6dd7332e3d239db278516b567c7d9a33e6403ec9
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.

Unfortunately, an error has occurred:
        timed out waiting for the condition

This error is likely caused by:
        - The kubelet is not running
        - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
        - 'systemctl status kubelet'
        - 'journalctl -xeu kubelet'
error execution phase kubelet-start: timed out waiting for the condition

So I check the kubetlet as understand that this is really what is going to communicate to the cluster that the node is attempting to join and perform the bootstrapping. Looks like kubelet is broken on the node:

systemctl status kubelet
* kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/kubelet.service.d
             `-10-kubeadm.conf
     Active: activating (auto-restart) (Result: exit-code) since Thu 2023-07-27 10:00:16 UTC; 757ms ago
       Docs: https://kubernetes.io/docs/home/
    Process: 191475 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
   Main PID: 191475 (code=exited, status=1/FAILURE)
        CPU: 202ms

As a comparison, the kubelet on the Control Plane seems fine and produces the following:

● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: active (running) since Wed 2023-07-26 15:55:15 BST; 24h ago
       Docs: https://kubernetes.io/docs/home/
   Main PID: 64452 (kubelet)
      Tasks: 19 (limit: 2081)
     Memory: 129.2M
        CPU: 2h 6min 33.994s
     CGroup: /system.slice/kubelet.service
             └─64452 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime=remote --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --pod-infra-container-image=registry.k8s.io/pause:3.7

I've updated the control plane and node firewall with port 6443 for the master and 10248 (not sure if that was required)

I've got CGroup setup up correctly, I believe and containerd running:

* containerd.service - containerd container runtime
     Loaded: loaded (/lib/systemd/system/containerd.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2023-07-26 13:23:16 UTC; 20h ago
       Docs: https://containerd.io
   Main PID: 133048 (containerd)
      Tasks: 10
     Memory: 19.8M
        CPU: 4min 9.603s
     CGroup: /system.slice/containerd.service
             `-133048 /usr/bin/containerd

Not much seems to be obviously misconfigured (for me, at least) in the node's kublet config YAML but the finger seems to be pointing to the node's kubelet having an issue and thus can't bootstrap and therefore the node can't join the cluster:

apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 0s
    enabled: true
  x509:
    clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 0s
    cacheUnauthorizedTTL: 0s
cgroupDriver: systemd
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
cpuManagerReconcilePeriod: 0s
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
logging:
  flushFrequency: 0
  options:
    json:
      infoBufferSize: "0"
  verbosity: 0
memorySwap: {}
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
resolvConf: /run/systemd/resolve/resolv.conf
rotateCertificates: true
runtimeRequestTimeout: 0s
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s

I'm at a bit of a loss now how to proceed. Any help greatly appreciated.

65

0 + 0

kubernetes

Unable to join nodes to kubernetes cluster; kubelet error preventing kubeadm join suspicion

Post an answer