Score:0

How could a request be dropped when sending a request to NodeJS?

in flag

I have an AWS ALB that load balances requests round-robin to four servers.

Each server uses pm2 to round-robin those requests to six CPUs.

NodeJS processes (react NextJS) are running on each of those six CPUs, served by Express.js. One of the first things they do is log the incoming request. (They are not fronted by a web server like apache or nginx, it goes straight to Express.js.)

Usually every single request that hits the ALB gets successfully forwarded, and logged by the NodeJS process. However, sometimes at high traffic times, some requests just get dropped and never make it to the NodeJS process. Obviously our server logs don't log these failures since they never make it there in the first place; we only see this gap by comparing to the ALB request counts.

I'm trying to understanding the mechanism that could lead to them being dropped. Could it be that a NodeJS internal queue times out? Or could it be a linux kernel thing? We are seeing indications that during periods of higher traffic, some of the CPUs are busy while others are idle, which makes me think of queue length (kingmans formula, little's law, etc). I can think of a few ways to decrease the probability of this happening, from increasing server capacity, to reducing response time, to changing the server-level load balancing strategy, but I'm more trying to understand where the request actually gets stuck and what determines whether and how it drops/disappears - especially if I could log it or send some kind of signal when it happens.

Snippets of pm2 config:

module.exports = {
  apps: [
    {
      name: 'community',
      script: 'dist/server.js',
      instances: -1,
      exec_mode: 'cluster',
      autorestart: true,
      watch: false,
      log_date_format: 'YYYY-MM-DD HH:mm Z',
      max_memory_restart: '2G',
// ...
// and env-specific configs, such as
      env_production: {
        NODE_ENV: 'production',
        NODE_OPTIONS: '--max-old-space-size=3584 --max-http-header-size=16380',
        LOG_LEVEL: 'INFO',
        PORT: 3000,
      },
    },
  ],
  deploy: {
// ...
  },
};
Michael Hampton avatar
cz flag
Can you explain in more detail exactly how "Each server uses pm2 to round-robin those requests to six CPUs"? It would be preferable to just show your configuration for the entire stack, as it's not possible yet to rule out any part of it.
in flag
pm2 is a nodes process manager that acts as a cluster to farm work to the cpus. It load-balances these requests in a round-robin fashion. But my question is more general - in a scenario where traffic is sent to a server that has a nodejs process serving traffic, under what circumstances would the nodejs never serve that request? I'm seeing more requests on the lb level than on the server level.
Michael Hampton avatar
cz flag
I already know what pm2 is. I am waiting to see your configuration.
in flag
ah, thanks for clarifying. I added it to the question.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.