Terraform ECS Capacity Provider not resulting in new ECS instances on demand

Question

Score:0

Server

Terraform ECS Capacity Provider not resulting in new ECS instances on demand

Philip Couling

11/25/22, 3:38 PM

From what I read here ECS capacity providers should (generally) prevent tasks immediately failing on resource limits by putting them in a "Provisioning" state and spinning up a new EC2 instance

This means, for example, if you call the RunTask API and the tasks don’t get placed on an instance because of insufficient resources (meaning no active instances had sufficient memory, vCPUs, ports, ENIs, and/or GPUs to run the tasks), instead of failing immediately, the task will go into the provisioning state (note, however, that the transition to provisioning only happens if you have enabled managed scaling for the capacity provider; otherwise, tasks that can’t find capacity will fail immediately, as they did previously).

I've setup an ECS cluster, with autoscaling group and ECS capacity provider in terraform. The autoscaling group is set with min_size = 1 and it immediately spins up a single instance... so I'm confident my launch configuration is fine.

However when I repeatedly call "RunTask" through the API (tasks with memory=128), I get tasks failing to start immediately with reason RESOURCE:MEMORY. Also no new instances start up.

I can't figure out what I've misconfigured.

This was all setup in terraform:

resource "aws_ecs_cluster" "ecs_cluster" {
  name = local.cluster_name


  setting {
    name  = "containerInsights"
    value = "enabled"
  }
  tags = var.tags
  capacity_providers = [aws_ecs_capacity_provider.capacity_provider.name]

  # I added this in an attempt to make it spin up new instance 
  default_capacity_provider_strategy {
    capacity_provider = aws_ecs_capacity_provider.capacity_provider.name
  }

}

resource "aws_ecs_capacity_provider" "capacity_provider" {
  name = "${var.tags.PlatformName}-stack-${var.tags.Environment}"

  auto_scaling_group_provider {
    auto_scaling_group_arn         = aws_autoscaling_group.autoscaling_group.arn
    managed_termination_protection = "DISABLED"

    managed_scaling {
      maximum_scaling_step_size = 4
      minimum_scaling_step_size = 1
      status                    = "ENABLED"
      target_capacity           = 100
    }
  }

  tags = var.tags
}

#Compute
resource "aws_autoscaling_group" "autoscaling_group" {
  name                      = "${var.tags.PlatformName}-${var.tags.Environment}"
  # If we're not using it, lets not pay for it
  min_size                  = "1"
  max_size                  = var.ecs_max_size
  launch_configuration      = aws_launch_configuration.launch_config.name
  health_check_grace_period = 60
  default_cooldown          = 30
  termination_policies      = ["OldestInstance"]
  vpc_zone_identifier       = local.subnets
  protect_from_scale_in     = false

  tag {
    key                 = "Name"
    value               = "${var.tags.PlatformName}-${var.tags.Environment}"
    propagate_at_launch = true
  }

  tag {
    key                 = "AmazonECSManaged"
    value               = ""
    propagate_at_launch = true
  }

  dynamic "tag" {
    for_each = var.tags
    content {
      key = tag.key
      propagate_at_launch = true
      value = tag.value
    }
  }

  enabled_metrics = [
    "GroupDesiredCapacity",
    "GroupInServiceInstances",
    "GroupMaxSize",
    "GroupMinSize",
    "GroupPendingInstances",
    "GroupStandbyInstances",
    "GroupTerminatingInstances",
    "GroupTotalInstances",
  ]
}

395

1 + 0

autoscaling

terraform

amazon-ecs

Terraform ECS Capacity Provider not resulting in new ECS instances on demand

Post an answer