GKE autoscaler sometimes doesn't scale pods

Question

Score:0

Server

GKE autoscaler sometimes doesn't scale pods

Oleksandr Bushkovskyi

3/19/23, 1:16 PM

We have a deployment configured with HPA based on the CPU metric. It can work fine for days, scaling pods up and down. And then at some point looks that it ignores metric and scales to some small number of pods. Usually we resolve it by setting manually minimal number of pods that could handle traffic. And after an hour or two it starts scale again. Here is the result of kubectl describe hpa command at the moment when autoscaler is not working for us:

                                                                 
Name:                                                  my-router-hpa
Namespace:                                             default
Labels:                                                label1=label1
                                                       label2=label2
Annotations:                                           <none>
CreationTimestamp:                                     Wed, 15 Sep 2021 12:19:16 +0000
Reference:                                             Deployment/my-router-v001
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  188% (943m) / 85%
Min replicas:                                          10
Max replicas:                                          100
Deployment pods:                                       10 current / 10 desired
Conditions:
  Type            Status  Reason            Message
  ----            ------  ------            -------
  AbleToScale     True    ReadyForNewScale  recommended size matches current size
  ScalingActive   True    ValidMetricFound  the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  True    TooFewReplicas    the desired replica count is less than the minimum replica count
Events:
  Type    Reason             Age                  From                       Message
  ----    ------             ----                 ----                       -------
  Normal  SuccessfulRescale  60m                  horizontal-pod-autoscaler  New size: 15; reason: cpu resource utilization (percentage of request) above target
  Normal  SuccessfulRescale  50m (x2 over 158m)   horizontal-pod-autoscaler  New size: 8; reason: cpu resource utilization (percentage of request) below target
  Normal  SuccessfulRescale  48m                  horizontal-pod-autoscaler  New size: 7; reason: cpu resource utilization (percentage of request) below target
  Normal  SuccessfulRescale  43m (x2 over 105m)   horizontal-pod-autoscaler  New size: 8; reason: cpu resource utilization (percentage of request) above target
  Normal  SuccessfulRescale  43m                  horizontal-pod-autoscaler  New size: 12; reason: cpu resource utilization (percentage of request) above target
  Normal  SuccessfulRescale  37m (x2 over 48m)    horizontal-pod-autoscaler  New size: 6; reason: cpu resource utilization (percentage of request) below target
  Normal  SuccessfulRescale  34m (x2 over 47m)    horizontal-pod-autoscaler  New size: 5; reason: cpu resource utilization (percentage of request) below target
  Normal  SuccessfulRescale  29m (x2 over 46m)    horizontal-pod-autoscaler  New size: 4; reason: cpu resource utilization (percentage of request) below target
  Normal  SuccessfulRescale  28m                  horizontal-pod-autoscaler  New size: 2; reason: cpu resource utilization (percentage of request) below target
  Normal  SuccessfulRescale  16m (x2 over 106m)   horizontal-pod-autoscaler  New size: 1; reason: cpu resource utilization (percentage of request) below target
  Normal  SuccessfulRescale  15m                  horizontal-pod-autoscaler  New size: 5; reason: cpu resource utilization (percentage of request) above target
  Normal  SuccessfulRescale  13m (x2 over 148m)   horizontal-pod-autoscaler  New size: 10; reason: cpu resource utilization (percentage of request) above target
  Normal  SuccessfulRescale  13m (x3 over 123m)   horizontal-pod-autoscaler  New size: 16; reason: cpu resource utilization (percentage of request) above target
  Normal  SuccessfulRescale  8m3s (x2 over 129m)  horizontal-pod-autoscaler  New size: 10; reason: cpu resource utilization (percentage of request) below target

It reports metric: "188% (943m) / 85%". But the last event is saying "below target".

Could you help me understand the behavior of GKE autoscaler or suggest the way to debug it?

75

0 + 0

kubernetes

google-cloud-storage

google-kubernetes-engine

GKE autoscaler sometimes doesn't scale pods

Post an answer