I have 2 kubernetes clusters in the IBM cloud, one has 2 Nodes, the other one 4.
The one that has 4 Nodes is working properly but at the other one I had to temporarily remove the worker nodes due to monetary reasons (shouldn't be payed while being idle).
When I reactivated the two nodes, everything seemed to start up fine and as long as I don't try to interact with Pods it still looks fine on the surface, no messages about inavailability or critical health status. OK, I deleted two obsolete Namespace
s which got stuck in the Terminating
state, but I could resolve that issue by restarting a cluster node (don't exactly know anymore which one it was).
When everything looked ok, I tried to access the kubernetes dashboard (everything done before was on IBM management level or in the command line) but surprisingly I found it unreachable with an error page in the browser stating:
503: Service Unavailable
There was a small JSON message at the bottom of that page, which said:
{
"kind": "Status",
"apiVersion": "v1",
"metadata": { },
"status": "Failure",
"message": "error trying to reach service: read tcp 172.18.190.60:39946-\u003e172.19.151.38:8090: read: connection reset by peer",
"reason": "ServiceUnavailable",
"code": 503
}
I sent a kubectl logs kubernetes-dashboard-54674bdd65-nf6w7 --namespace=kube-system
where the Pod
was shown as running, but the result was not logs to be viewed, it was this message instead:
Error from server: Get "https://10.215.17.75:10250/containerLogs/kube-system/kubernetes-dashboard-54674bdd65-nf6w7/kubernetes-dashboard":
read tcp 172.18.135.195:56882->172.19.151.38:8090:
read: connection reset by peer
Then I found out I'm neither able to get the logs of any Pod
running in that cluster, nor am I able to deploy any new custom kubernetes object that requires scheduling (I actually could apply Service
s or ConfigMap
s, but no Pod
, ReplicaSet
, Deployment
or similar).
I already tried to
- reload the worker nodes in the workerpool
- restart the worker nodes in the workerpool
- restarted the kubernetes-dashboard
Deployment
Unfortunately, none of the above actions changed the accessibility of the Pod
s.
There's another thing that might be related (though I'm not quite sure it actually is):
In the other cluster that runs fine, there are three calico Pod
s running and all three are up while in the problematic cluster only 2 of the three calico Pod
s are up and running, the third one stays in Pending
state and a kubectl describe pod calico-blablabla-blabla
reveals the reason, an Event
Warning FailedScheduling 13s default-scheduler
0/2 nodes are available: 2 node(s) didn't have free ports for the requested pod ports.
Does anyone have a clue about what's going on in that cluster and can point me to possible solutions? I don't really want to delete the cluster and spawn a new one.
Edit
The result of kubectl describe pod kubernetes-dashboard-54674bdd65-4m2ch --namespace=kube-system
:
Name: kubernetes-dashboard-54674bdd65-4m2ch
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Node: 10.215.17.82/10.215.17.82
Start Time: Mon, 15 Nov 2021 09:01:30 +0100
Labels: k8s-app=kubernetes-dashboard
pod-template-hash=54674bdd65
Annotations: cni.projectcalico.org/containerID: ca52cefaae58d8e5ce6d54883cb6a6135318c8db53d231dc645a5cf2e67d821e
cni.projectcalico.org/podIP: 172.30.184.2/32
cni.projectcalico.org/podIPs: 172.30.184.2/32
container.seccomp.security.alpha.kubernetes.io/kubernetes-dashboard: runtime/default
kubectl.kubernetes.io/restartedAt: 2021-11-10T15:47:14+01:00
kubernetes.io/psp: ibm-privileged-psp
Status: Running
IP: 172.30.184.2
IPs:
IP: 172.30.184.2
Controlled By: ReplicaSet/kubernetes-dashboard-54674bdd65
Containers:
kubernetes-dashboard:
Container ID: containerd://bac57850055cd6bb944c4d893a5d315c659fd7d4935fe49083d9ef8ae03e5c31
Image: registry.eu-de.bluemix.net/armada-master/kubernetesui-dashboard:v2.3.1
Image ID: registry.eu-de.bluemix.net/armada-master/kubernetesui-dashboard@sha256:f14f581d36b83fc9c1cfa3b0609e7788017ecada1f3106fab1c9db35295fe523
Port: 8443/TCP
Host Port: 0/TCP
Args:
--auto-generate-certificates
--namespace=kube-system
State: Running
Started: Mon, 15 Nov 2021 09:01:37 +0100
Ready: True
Restart Count: 0
Requests:
cpu: 50m
memory: 100Mi
Liveness: http-get https://:8443/ delay=30s timeout=30s period=10s #success=1 #failure=3
Readiness: http-get https://:8443/ delay=10s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/certs from kubernetes-dashboard-certs (rw)
/tmp from tmp-volume (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-sc9kw (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kubernetes-dashboard-certs:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-certs
Optional: false
tmp-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
kube-api-access-sc9kw:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node-role.kubernetes.io/master:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 600s
node.kubernetes.io/unreachable:NoExecute op=Exists for 600s
Events: <none>