Score:2

Server Connection Issue - With OK CPU/RAM

us flag

Recently we had an occurrence where we were unable to connect to mutliple masters on our Redis cluster.

Connections from our code base were timing out. We were also unable to SSH into the box during this period, essentially locking us out.

This has happened on multiple occassions and each time the CPU was around 20% and memory usage was also around 20%. The number of tcp connections varied during each event between 7k and 12k, well below what we would expect to be an alarming level.

Connections that were already established continued to function normally. Among those existing connections were our metrics exporters, so they were able to still collect metrics on connections/cpu etc.

The network in/out would slowly decline as existing connections died off, however new ones could not connect at all, as if they were refused by the server.

We have reviewed settings such as SOMAXCONN and available file descriptors, but have yet been able to determine the reason new connections could not be made, as there were no clear anomalies in any stats we reviewed prior to the occurrence.

The servers are running Amazon Linux 2 on x2gd.medium instance types on AWS.

The inability to log in via SSH, while the majority of the traffic was on another port seemed quite odd.

Does anyone have any ideas as to why connections could not be made, while all obvious metrics seemed OK?

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.