Score:0

How to minimize ecs autoscaling reaction time from terraform?

us flag

When you create an ECS autoscaling policy, two alarms tag along with it: one for scaling up ("out"), one for scaling down ("in").

The scale-out ones I see created appear to sample CPU utilization (or the metric of interest) every minute, and only trigger automatic scaling when three consecutive data points have breached the threshold.

This means that if I see a traffic spike, three minutes will pass before scale-out happens. (In fact, on average the threshold breach will happen in the middle of a sampling interval, so the delay is three and a half minutes.)

I can adjust the sampling rate and the number of data points required through the AWS console web interface.

However, I would like to manage my infrastructure through Terraform.

How can I use Terraform but no manual clickery to shorten the time between (a) the first breach of the threshold; and (b) the point in time at which I begin the scale-out? (Also: is this a dumb thing to attempt? Am I going about it in an awk-basscards way?)

As far as I can tell, it looks like ice skating uphill: creating autoscaling policies (which I can do through Terraform) automatically creates two alarms and returns handles to them (see https://docs.aws.amazon.com/autoscaling/application/APIReference/API_PutScalingPolicy.html) but Terraform doesn't expose those handles (see https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/appautoscaling_policy#attributes-reference). Is it still possible in Terraform? Does it require heroic efforts?

Score:1
au flag

You can definitely achieve this with Terraform. There are a few ways to achieve this but I will focus on the one that gives you more flexibility.

Suppose you already have your aws_autoscaling_group resource defined, after that you need to define your scaling policies for your ASG and CloudWatch alarms that will trigger them. I usually track 3 different metrics for autoscaling: MemoryReservation, CPUReservation and CPUUtilization.

An example how to setup autoscaling based on CPUUtilization.

Scaling policies for our ASG:
resource "aws_autoscaling_policy" "my-cpu-scale-up" {
  name = "my-cpu-scale-up"
  scaling_adjustment = 1
  adjustment_type = "ChangeInCapacity"
  cooldown = 60
  autoscaling_group_name = aws_autoscaling_group.[your-asg-resource].name
}

resource "aws_autoscaling_policy" "my-cpu-scale-down" {
  name = "my-cpu-scale-down"
  scaling_adjustment = -1
  adjustment_type = "ChangeInCapacity"
  cooldown = 300
  autoscaling_group_name = aws_autoscaling_group.[your-asg-resource].name
}
CloudWatch alarms that will trigger one of our policies.
resource "aws_cloudwatch_metric_alarm" "my-cpu-usage-high" {
  alarm_name = "my-cpu-usage-high"
  comparison_operator = "GreaterThanOrEqualToThreshold"
  evaluation_periods = "2"
  metric_name = "CPUUtilization"
  namespace = "AWS/EC2"
  period = "60" // in seconds
  statistic = "Average"
  threshold = "70" // in %
  alarm_description = "This metric monitors the cluster for high CPU usage"
  alarm_actions = [
    aws_autoscaling_policy.my-cpu-scale-up.arn
  ]
  dimensions ={
    AutoScalingGroupName= aws_autoscaling_group.[your-asg-resource].name
  }
}

resource "aws_cloudwatch_metric_alarm" "my-cpu-usage-low" {
  alarm_name = "my-cpu-usage-low"
  comparison_operator = "LessThanOrEqualToThreshold"
  evaluation_periods = "2"
  metric_name = "CPUUtilization"
  namespace = "AWS/EC2"
  period = "60"
  statistic = "Average"
  threshold = "20"
  alarm_description = "This metric monitors my cluster for low CPU usage"
  alarm_actions = [
    aws_autoscaling_policy.my-cpu-scale-down.arn
  ]
  dimensions ={
    AutoScalingGroupName= aws_autoscaling_group.[your-asg-resource].name
  }
}

As you can see from this example, we have can play around with alarms configuration until we achieve the desired result.

Hope that helps!

Jonas Kölker avatar
us flag
This is step scaling, right, and not target tracking? Our own research suggests that non-step-scaling solutions won't do want we want to accomplish, so it's nice to hear that step scaling solutions will (or at least might; I don't even know all the requirements so I've only given the known relevant ones).
Vitalii Strimbanu avatar
au flag
@JonasKölker yes this is step scaling approach. Both step scaling and target tracking have their usages, so you have to choose based on your requirements. For example, I use target tracking for some ECS services, to keep an average of 1K requests per minute for each task.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.