Score:0

Debian 11 VM sudden CPU overload due to Node.js

de flag

I'm currently running into an issue where one of our Proxmox VMs, running on Debian 11, suffers of very sudden CPU overloads. This has happened past week already and again today. When this happens, the server is completely unresponsive. We can't even access it through the Proxmox console as it won't accept any input. This is what the CPU graph (average) looks like:

enter image description here

Memory, network or disk usage don't show any sudden spikes when this happens. It's only the CPU maxing out. The VM has two virtual cores, so I suppose the problem lies on a single-core application.

The VM is used for several customer projects as a staging environment. There are several applications running including PostgreSQL, Node.js and PHP. We have a New Relic agent running on the machine and have checked the process history:

enter image description here

As you can see, some Node.js application seems to be the culprit. The affected process doesn't show any details, though. Now the issue is: How do we diagnose this? There are multiple Node.js apps running through PM2 on the machine. As we can't access the Proxmox console or SSH into the machine when this happens, we are unable to check the PM2 process list right when this happens. We have checked various logs in /var/log, unable to find anything related to this.

Any ideas?

Jaromanda X avatar
ru flag
I gather the Entity doesn't help identify the rogue process? - though it seems the rogue node process name is just node with user webdeploy, the other node processes using the same user seem to have a difference process name. How many node processes are there (looks like 4)? What do each do? Is there a likely culprit based on what the node process does?
Maximilian Krause avatar
de flag
@JaromandaX Yes, I'd suppose the culprit is the activity of one of the node processes. There are 7 node processes running, each of them running a different PM2 container (or PM2 itself). In the Linux process list (`ps`), they all have more information attached to them (such as the path of the PM2 server). I therefore can't fully identify which process is the culprit from the New Relic entity. It would already help if we could set-up something to gather some more info when this happens the next time.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.