Why does boot time and workload performance degrade after launching a large number of T2 instances on AWS?

David Sýkora

3/10/24, 11:24 PM

I'm running a swarm of T2 instances on AWS, each with an average lifespan of 5 minutes. I start an instance, do some workload, and then terminate it. I'm using my own AMI on a GP2 volume.

After launching around 150 instances, I've noticed that both the boot time and workload performance have degraded significantly. Normally, an instance boots and the SSH server becomes available within 30-40 seconds, but now it takes around 120 seconds. Additionally, the workload has become three times slower.

I suspect that the issue may be related to CPU or I/O credit accumulation, but I'm not sure how to verify this or what other factors could be contributing to the issue.

This screenshot shows a graph of CPU credit usage per instance on AWS. The graph displays CPU credits over time for a set of T2 instances. Each line on the graph represents a different instance, with the x-axis showing the time elapsed since the instance was launched and the y-axis representing the CPU credit balance. From the graph, it is apparent that newer instances have 0 CPU credits from the beginning and accrue credits over time.

Can anyone provide insight into what could be causing this performance degradation and suggest potential solutions or optimizations to improve the performance of my T2 instances?

Is it possible for an AWS account to have a global CPU credit balance that applies to all T2 instances launched within the account, rather than a separate balance for each individual instance?

Thank you in advance for your help.

1 + 0

performance

amazon-ec2

amazon-web-services

Score:0

Server

David Sýkora

3/14/24, 12:01 PM

It appears that we may be running into a limit on the number of launch credits that an AWS account can receive for T2 Standard instances. According to the AWS documentation, the limit is 100 launches or starts of all T2 Standard instances combined per account, per Region, per rolling 24-hour period. If we launch or start more than 100 T2 Standard instances in a given 24-hour period, we may not receive launch credits for all of them.

"Hidden" part of docs

It's also worth noting that newer AWS accounts may have a lower or higher limit according to its age, which increases over time based on usage. This could explain why we're experiencing degraded performance after launching around 150 instances.

+ 0

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: Why does boot time and workload performance degrade after launching a large number of T2 instances on AWS?

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.