We are experiencing some issues with our GKE cluster. Here are the error messages we encountered:
When running the command kubectl logs -f pod_name
, we received the following error: "Error from server: Get 'https://x.x.x.x:10250/containerLogs/default/xxx': tunnel closed."
Similarly, when trying to execute a command inside the pod using kubectl exec -it pod_name -- /bin/bash
, we encountered the error: "Error from server: error dialing backend: tunnel closed."
Although all nodes appear to be healthy and the kubelet is running, we noticed some errors related to the Google Metrics Agent and Autoscaler Agent:
In the Prometheus discovery module (github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:469
), we encountered the error: "Failed to watch *v1.Pod: failed to list *v1.Pod: the server was unable to return a response in the time allotted, but may still be processing the request (get pods)."
Additionally, in the node collector (collectors/node.go:159
), we received the error: "Failed to query API server for node data. Kind: receiver, Name: kubenode, Error: Get 'https://x.x.x.x:443/api/v1/nodes/gke-xxx-xx-pool-xxx?timeout=4.5s': net/http: request canceled (Client.Timeout exceeded while awaiting headers)."
The autoscaler is also encountering an issue: "Error while getting cluster status: timed out waiting for the condition."
Furthermore, in the control plane logs from the Google Cloud Console, we observed the message: "Too Many Requests" with the following details: resourceName: "apiextensions.k8s.io/v1/customresourcedefinitions."
We are also unable to schedule new pods. Even when attempting to deploy with Helm, the deployment remains stuck at 0/1.
We kindly request any assistance you can provide in resolving these issues.
Thank you.