We have created GKE cluster and we are getting errors from gke-metrics-agent. The errors shows up every cca 30 minutes. It's always the same 62 errors.
All the errors have label k8s-pod/k8s-app: "gke-metrics-agent".
First error is:
error exporterhelper/queued_retry.go:245 Exporting failed. Try enabling retry_on_failure config option. {"kind": "exporter", "name": "googlecloud", "error": "rpc error: code = DeadlineExceeded desc = Deadline expired before operation could complete."
This error is followed by these errors in order
- "go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send"
- "/go/src/gke-logmon/gke-metrics-agent/vendor/go.opentelemetry.io/collector/exporter/exporterhelper/queued_retry.go:245"
- go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
- /go/src/gke-logmon/gke-metrics-agent/vendor/go.opentelemetry.io/collector/exporter/exporterhelper/metrics.go:120
There are cca 40 errors like this. Two errors which stand out are:
- error exporterhelper/queued_retry.go:175 Exporting failed. Dropping data. Try enabling sending_queue to survive temporary failures. {"kind": "exporter", "name": "googlecloud", "dropped_items": 19}"
- warn batchprocessor/batch_processor.go:184 Sender failed {"kind": "processor", "name": "batch", "error": "rpc error: code = DeadlineExceeded desc = Deadline expired before operation could complete."}"
I tried to search those errors on google but I could not find anything. I can't even find any documentation for gke-metrics-agent.
Things I tried:
- check quotas
- update GKE to newer version (current version is 1.21.3-gke.2001)
- update nodes
- disable all firewall rules
- give all permissions to k8s nodes
I can provide more information about our kubernetes cluster but I don't know what information may be important to solve this issue.