Score:0

AWS Load Balancer 502 Bad Gateway

sh flag

I have multiple node web servers hosted on EC2 with a Load Balancer, and some users are getting a 502 even before the request reaches the server.

I don't have the logs of those requests inside the servers, that's why I am assuming that the request never reaches the server.

I had a similar problem before, and I had to add keepAliveTimeout and headersTimeout to the node configuration.

I have a few unhealthy instances during the day, every day, but the time when does that happen doesn't always match with the time of the 502 error. Should I increase the health check timeout from 5s to 10s and see what happens?

The memory and the CPU usage seems fine.

Any tips on how should I debug this issue?

Score:0
lk flag

you already know the answer: unhealthy instances. even if times does not match, you should fix that problem and check if others issues persist after.

increase instance size, increase ELB healthcheck timeouts, scale up machines and check if it helps

soltex avatar
sh flag
Yes, you are right! I will start by increasing the healthcheck timeouts. Actually, the memory usage and cpu seem fine to me, that's why I am not sure If I should upgrade the machines. Anyway, I will give it a try if the healthcheck timeouts don't work.
exeral avatar
lk flag
the size may not resolve your problem since your metrics are ok. but it is easy and cheap to bump the size for 1hour, so still worth to give it a try.
soltex avatar
sh flag
Increasing the healthcheck timeouts decreased a few unhealthy instances, but the number of 502 errors is the same. I will try to bump the instances, as you said, is still worth giving it a try.
soltex avatar
sh flag
Bumping the instances didn't work. Do you have any other ideas? I don't even know why do I have unhealhty instances if the metrics are ok.
exeral avatar
lk flag
what is your healtcheck. what are the correspond logs on the EC2 to that healthchecks.
soltex avatar
sh flag
This is my healcheck, `Unhealthy threshold`: 2 consecutive health check failures (same to healh tresold), `Timeout`: 5s, `Interval`: 10s, `Algorithm`: Round robin, and the logs are something like this `GET /health-check 200 0ms`. Unfortunately, I don't have the logs from the instance that was considered unhealhty, I might enable that and see what was the response time right before the instane was terminated.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.