We have a deployment configured with HPA based on the CPU metric. It can work fine for days, scaling pods up and down. And then at some point looks that it ignores metric and scales to some small number of pods. Usually we resolve it by setting manually minimal number of pods that could handle traffic. And after an hour or two it starts scale again.
Here is the result of kubectl describe hpa
command at the moment when autoscaler is not working for us:
Name: my-router-hpa
Namespace: default
Labels: label1=label1
label2=label2
Annotations: <none>
CreationTimestamp: Wed, 15 Sep 2021 12:19:16 +0000
Reference: Deployment/my-router-v001
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): 188% (943m) / 85%
Min replicas: 10
Max replicas: 100
Deployment pods: 10 current / 10 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ReadyForNewScale recommended size matches current size
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
ScalingLimited True TooFewReplicas the desired replica count is less than the minimum replica count
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulRescale 60m horizontal-pod-autoscaler New size: 15; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 50m (x2 over 158m) horizontal-pod-autoscaler New size: 8; reason: cpu resource utilization (percentage of request) below target
Normal SuccessfulRescale 48m horizontal-pod-autoscaler New size: 7; reason: cpu resource utilization (percentage of request) below target
Normal SuccessfulRescale 43m (x2 over 105m) horizontal-pod-autoscaler New size: 8; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 43m horizontal-pod-autoscaler New size: 12; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 37m (x2 over 48m) horizontal-pod-autoscaler New size: 6; reason: cpu resource utilization (percentage of request) below target
Normal SuccessfulRescale 34m (x2 over 47m) horizontal-pod-autoscaler New size: 5; reason: cpu resource utilization (percentage of request) below target
Normal SuccessfulRescale 29m (x2 over 46m) horizontal-pod-autoscaler New size: 4; reason: cpu resource utilization (percentage of request) below target
Normal SuccessfulRescale 28m horizontal-pod-autoscaler New size: 2; reason: cpu resource utilization (percentage of request) below target
Normal SuccessfulRescale 16m (x2 over 106m) horizontal-pod-autoscaler New size: 1; reason: cpu resource utilization (percentage of request) below target
Normal SuccessfulRescale 15m horizontal-pod-autoscaler New size: 5; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 13m (x2 over 148m) horizontal-pod-autoscaler New size: 10; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 13m (x3 over 123m) horizontal-pod-autoscaler New size: 16; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 8m3s (x2 over 129m) horizontal-pod-autoscaler New size: 10; reason: cpu resource utilization (percentage of request) below target
It reports metric: "188% (943m) / 85%". But the last event is saying "below target".
Could you help me understand the behavior of GKE autoscaler or suggest the way to debug it?