Score:1

GP3 volumes do not hit throughput or IOPS limits

ru flag

I'm working on a Postgres upgrade with pg_upgrade, and the meat of the process is copying the database's datafiles [unmodified] from the old cluster directory to the new. In order to not bloat the data volume I've attached a second EBS volume to the instance. Also, in order to get the upgrade completed quickly, I've set the throughput to its maximum value [1000MiB/s] and left the IOPS as default [4000] for both volumes, waiting for the volume to report that the "optimization" is completed.

However, during the upgrade process I've noticed that neither the throughput nor IOPS come close to the configured limits, even though the operations are copying large, contiguous files. Below is a snapshot of the monitoring for the volumes showing two different runs of the process.

enter image description here

The OS is Rocky Linux 8.5, the instance is a freshly-made m5a.2xlarge instance using the AMI built by Rocky, and the volumes are formatted as ext4. The instance CPU usage over the same period is as below, though the OS stats showed a fair amount of IOwait.

enter image description here

Would there be a parameter that I should tweak regarding these volumes, or the instance, or some OS config that I'm missing? Or is this just a symptom of the EBS backing store being too busy to actually service my needs?

Tim avatar
gp flag
Tim
What's your instance size? Was the instance created in AWS or migrated into AWS? Is Rocky Linux officially supported in AWS? I'd probably try this with Amazon Linux2 and see if you get similar results. AWS Support are good for questions like this if you don't get help here. I asked them something similar once and got a useful reply. In my case it was about disk performance after a restore from snapshot, they reminded me data is streamed to EBS from S3 on demand which can be slow, different issue from yours.
Sammitch avatar
ru flag
This is a freshly-made m5a.2xlarge instance stood up from the AMI built by Rocky. I don't know if Rocky is "officially" supported by Amazon, but the AMI is certainly nitro-compatible at the least. The data was cloned from another server we use to stage non-production DB data, not from a snapshot. I'm not crazy about porting the DB over to Amazon Linux, but I suppose I could try reproducing similar results with a benchmark utility.
Tim avatar
gp flag
Tim
I wonder if a cloned instance is done with snapshots behind the scenes. Try warming your volume to see if it helps (see my link), as it sounds a lot like the problem I had not long ago. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-initialize.html . I would also have a shot with AL2, and do a generic disk throughput test with Rocky / AL2. You probably know already know that AWS RDS / Aurora managed database can have many benefits over EC2 databases, including that they can be cheaper in some circumstances when redundancy and backups are needed.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.