Score:0

GitHub Runner taking 2x physical RAM - what's the fix?

rs flag

I've had 4x crashes now of AWS ERP servers due to memory apparently maxing out and the system essentially dying with 100% CPU and no [little] available RAM.

Ubuntu 18.04.5 LTS (GNU/Linux 5.4.0-1060-aws x86_64) (AWS AMI)

Three times this occurred in the middle of a GitHub action. The action was doing a database import, and then a slack notification. You would thus think it was one of these steps that caused the issue, but oddly the steps all completed normally. The database was fine and the slack notification was pushed.

GitHub itself lost connection with the runner, and virtual memory went through the roof even after the action was completed.

A fourth time this happened while NOTHING was running. The server was in fact idling with nothing going on. I don't have any logs or "top" screenshots of THAT, however, but I did catch it in the act one time:

Image of TOP display

So the system is an AWS VM with 4G of RAM. Note that I believe the SI that setup this system configured for no swap space. This is arguably correct [very arguably] for a server, in the sense that if there's a memory leak you want the system to report out of memory and take corrective action, as with a memory leak you're going to eventually die anyway.

In the short term, I was asked to just double the RAM. This is somewhat unnecessary as it's a very lightly loaded system (normally runs with only about 2G of RAM in use when doing a heavy batch job), and frankly if the GitHub Runner.Worker maxes out at 7GB of RAM on a 4GB system, why wouldn't it max out at 16GB of RAM on an 8GB VM, but we'll see if it crashes again. I'm not averse to changing TFG's swap configuration, but I'm not sure it's a fix

I have reported this to GitHub, but after >3weeks of inaction thought I'd check here and see if anyone has any ideas or fixes.

Thank you,

== John ==

in flag
Have you considered adding a swap file temporarily to afford the server a little more breathing space when performing memory-heavy processes such as data imports? Once the work is complete, you can remove (or reduce) the swap file. Generally I have 2GB of swap on non-ephemeral EC2 instances to give the machines a little more breathing room
rs flag
There are no memory-intensive apps going on. There is a catastrophic memory leak going on from a 3rd party app. So, increasing swap to what? One Terrabyte?
rs flag
That sounds argumentative, your suggestion is generally the accepted answer, but I think there's a larger issue here. When a server runs out of RAM from a memory leak or anything, really, the process should be cancelled. That's not happening. Adding swap only prolongs the problem.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.