No Pods reachable or schedulable on kubernetes cluster

Question

Score:0

Server

No Pods reachable or schedulable on kubernetes cluster

deHaar

3/12/23, 3:17 PM

I have 2 kubernetes clusters in the IBM cloud, one has 2 Nodes, the other one 4.

The one that has 4 Nodes is working properly but at the other one I had to temporarily remove the worker nodes due to monetary reasons (shouldn't be payed while being idle).

When I reactivated the two nodes, everything seemed to start up fine and as long as I don't try to interact with Pods it still looks fine on the surface, no messages about inavailability or critical health status. OK, I deleted two obsolete Namespaces which got stuck in the Terminating state, but I could resolve that issue by restarting a cluster node (don't exactly know anymore which one it was).

When everything looked ok, I tried to access the kubernetes dashboard (everything done before was on IBM management level or in the command line) but surprisingly I found it unreachable with an error page in the browser stating:

503: Service Unavailable

There was a small JSON message at the bottom of that page, which said:

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": { },
  "status": "Failure",
  "message": "error trying to reach service: read tcp 172.18.190.60:39946-\u003e172.19.151.38:8090: read: connection reset by peer",
  "reason": "ServiceUnavailable",
  "code": 503
}

I sent a kubectl logs kubernetes-dashboard-54674bdd65-nf6w7 --namespace=kube-system where the Pod was shown as running, but the result was not logs to be viewed, it was this message instead:

Error from server: Get "https://10.215.17.75:10250/containerLogs/kube-system/kubernetes-dashboard-54674bdd65-nf6w7/kubernetes-dashboard":
read tcp 172.18.135.195:56882->172.19.151.38:8090:
read: connection reset by peer

Then I found out I'm neither able to get the logs of any Pod running in that cluster, nor am I able to deploy any new custom kubernetes object that requires scheduling (I actually could apply Services or ConfigMaps, but no Pod, ReplicaSet, Deployment or similar).

I already tried to

reload the worker nodes in the workerpool
restart the worker nodes in the workerpool
restarted the kubernetes-dashboard Deployment

Unfortunately, none of the above actions changed the accessibility of the Pods.

There's another thing that might be related (though I'm not quite sure it actually is):

In the other cluster that runs fine, there are three calico Pods running and all three are up while in the problematic cluster only 2 of the three calico Pods are up and running, the third one stays in Pending state and a kubectl describe pod calico-blablabla-blabla reveals the reason, an Event

Warning  FailedScheduling  13s   default-scheduler
0/2 nodes are available: 2 node(s) didn't have free ports for the requested pod ports.

Does anyone have a clue about what's going on in that cluster and can point me to possible solutions? I don't really want to delete the cluster and spawn a new one.

Edit

The result of kubectl describe pod kubernetes-dashboard-54674bdd65-4m2ch --namespace=kube-system:

Name:                 kubernetes-dashboard-54674bdd65-4m2ch
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Node:                 10.215.17.82/10.215.17.82
Start Time:           Mon, 15 Nov 2021 09:01:30 +0100
Labels:               k8s-app=kubernetes-dashboard
                      pod-template-hash=54674bdd65
Annotations:          cni.projectcalico.org/containerID: ca52cefaae58d8e5ce6d54883cb6a6135318c8db53d231dc645a5cf2e67d821e
                      cni.projectcalico.org/podIP: 172.30.184.2/32
                      cni.projectcalico.org/podIPs: 172.30.184.2/32
                      container.seccomp.security.alpha.kubernetes.io/kubernetes-dashboard: runtime/default
                      kubectl.kubernetes.io/restartedAt: 2021-11-10T15:47:14+01:00
                      kubernetes.io/psp: ibm-privileged-psp
Status:               Running
IP:                   172.30.184.2
IPs:
  IP:           172.30.184.2
Controlled By:  ReplicaSet/kubernetes-dashboard-54674bdd65
Containers:
  kubernetes-dashboard:
    Container ID:  containerd://bac57850055cd6bb944c4d893a5d315c659fd7d4935fe49083d9ef8ae03e5c31
    Image:         registry.eu-de.bluemix.net/armada-master/kubernetesui-dashboard:v2.3.1
    Image ID:      registry.eu-de.bluemix.net/armada-master/kubernetesui-dashboard@sha256:f14f581d36b83fc9c1cfa3b0609e7788017ecada1f3106fab1c9db35295fe523
    Port:          8443/TCP
    Host Port:     0/TCP
    Args:
      --auto-generate-certificates
      --namespace=kube-system
    State:          Running
      Started:      Mon, 15 Nov 2021 09:01:37 +0100
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:        50m
      memory:     100Mi
    Liveness:     http-get https://:8443/ delay=30s timeout=30s period=10s #success=1 #failure=3
    Readiness:    http-get https://:8443/ delay=10s timeout=30s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /certs from kubernetes-dashboard-certs (rw)
      /tmp from tmp-volume (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-sc9kw (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  kubernetes-dashboard-certs:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kubernetes-dashboard-certs
    Optional:    false
  tmp-volume:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  kube-api-access-sc9kw:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node-role.kubernetes.io/master:NoSchedule
                             node.kubernetes.io/not-ready:NoExecute op=Exists for 600s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 600s
Events:                      <none>

101

0 + 0

cluster

port

port-forwarding

scheduling

kubernetes

No Pods reachable or schedulable on kubernetes cluster

Edit

Problem resolved…

Root cause:

Solution steps:

General recommendation:

Post an answer