Getting 504 response timeouts via AWS ALB, but not directly

Ashesh

3/9/24, 12:18 PM

Our production environment contains 2 ALBs: a public facing ALB and a private one. Both of these ALBs support HTTP/2.

Now I have a target group which supports HTTP/1.1 containing an ECS service. The very strange thing I'm observing is that:

When requests are made to this service via either of the ALBs, approximately 1 out of 5 requests fail with a 504 gateway timeout.

When I make requests to the IP address of the service directly (via an EC2 instance in the same VPC), I don't get any such timeouts.

An older version of the same application works without 504s via any of the ALBs.

The timeout on the ALBs is set to 30s. In the application it is set to 60s (nginx) and the proxied service also has the same value.

I've compared the response headers in both servers, but they are identical.

My question here is, what should I be looking at as the potential culprit? I know the keep-alive caveats are a huge problem, but again, two different versions of the same application behave differently and I find there is very little to help me debug this.

The current architecture is:

Client -> [AWS ALB] -> [ AWS ECS: Docker Container ]

Within the [DockerContainer] I have:

[ nginx ] -> [ application ]

Another notable point is: I cannot reproduce the issue on our staging environment which uses the same architecture: the only difference being that it uses AWS EC2 to host the docker container instead of ECS. ECS CPU/Memory usage seems nominal.

257

0 + 1

amazon-web-services

docker

amazon-ecs

amazon-alb

Tim

3/9/24, 7:40 PM

I would start by checking the target group membership to make sure it has exactly the servers in it that it should, and ensuring you have health checks enabled on both load balancers. What is the private ALB used for? Perhaps edit your post to include a diagram. If this is a production problem and you can't solve it quickly I suggest getting AWS Business Support for a month or two, they're very helpful.

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: Getting 504 response timeouts via AWS ALB, but not directly

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.