We have recently noticed that our Google App Engine project was experiencing failures periodically every 25 hours and 10 minutes (1510 mins) for three consecutive days for no apparent reason.
During the issue we saw requests failing with the code 499 (Client Closed Request) after very long request duration (10s). The requests normally take a few hundred milliseconds or occasionally 2-3 seconds, but never close to 10 seconds. At the time we didn't see any uptick in traffic and we don't have any background jobs running. CPU and memory all were fine until the issue started, then CPU increased somewhat (e.g. from around 10% to 60%) and even triggered a temporary scale-up from 3 to 5 hosts.
The project is a Python Fast API image deployed to a flex environment, min 3, max 12 hosts at the time.
The timing of these failures were interesting as they happened almost exactly 25 hours and 10 minutes apart from each other. We have had a few deployments during these days at various times, there is no correlation to server uptime either.
The timestamps below are in UTC:
2021-11-17 17:43
2021-11-18 18:53
2021-11-19 20:03
Has anyone seen anything similar happening on Google App Engine or perhaps with the mentioned Fast API image?