Is anyone else experiencing problems after recent automated AWS Aurora MySQL database engine updates? (specifically "5.7.mysql_aurora.2.10.3")
- Over the weekend our web servers started reporting issues and were becoming non-responsive at times.
- Troubleshooting today showed that since Thu morning we have been hitting max_connections=80 regularly on an AWS Aurora database that had been running very-lightly-loaded beforehand.
Further digging showed two step-changes in the DatabaseConnections metric (see plot 1)
- The most recent step change on May 25 corresponds to an automated DB engine upgrade (5.7.mysql_aurora.2.10.3 --> 5.7.mysql_aurora.2.11.2) during the DB instance maintenance window...
- ...and an earlier step-change on May 7 corresponds to an unknown change during an earlier DB cluster maintenance window.
Other factors:
- CPU usage has been impacted only slightly (see plot 2)
- DB queries and usage patterns are unchanged.
- There's nothing particularly interesting in the release-notes for this DB engine update.
We have put a few workarounds in place, but it looks like something in the recent Aurora MySQL DB engine update is adversely impacting performance, and I'm keen to hear from others who may be experiencing similar issues... and any solutions you may have found.
EDIT 6/6/2023
This Aurora DB seems to have now magically healed itself (ironically during another cluster maintenance window) - see plot below:
Note: based on analysis of our web servers, we had migrated a legacy web server from BlueHost (USA) to AWS (Sydney) to try to cope with the underperforming Aurora instance. This migration was implemented on June 2 and resulted in a step change improvement in performance... but then during the cluster maintenance window on June 4, the DB magically returned to pre-disruption performance once again!
Here's the final before/after profile:
I wish I could find a way to audit what is being done during each Aurora maintenance window!