Score:1

Identifying cause of too many CLOSE_WAIT in IIS

af flag

I have a windows server running a web api that serves an android app, and today I started getting alarms saying that my server was timing out.

This server is running behind Cloud Flare.

When I connected to the server via RDC, I noticed that it was using 0% of CPU but had more than 3200 connections as can be seen here: connections

The "normal" amount of connection would be something close to 300. So it was 10x more.

I thought it was under attack and then I activated the "I'm under attack mode" from cloudflare but it didn't work at all.

I restarted IIS by running iisreset and it came back to normal for a few minutes, then the number of connections started increasing again!

I jumped in Cloud Flare support chat and the support agent said he was not seeing anything out of ordinary and there was nothing they could do.

My server allow only connections from CF servers.

I decided to check what those connections were and when I ran netstat, I got this:

Active Connections

  Proto  Local Address          Foreign Address        State
  TCP    xxx:80       CF_IP_ADDRESS.157:13824  CLOSE_WAIT
  TCP    xxx:80       CF_IP_ADDRESS.157:17952  CLOSE_WAIT
  TCP    xxx:80       CF_IP_ADDRESS.173:21754  ESTABLISHED
  TCP    xxx:80       CF_IP_ADDRESS.173:22890  CLOSE_WAIT
  TCP    xxx:80       CF_IP_ADDRESS.173:24456  CLOSE_WAIT
  TCP    xxx:80       CF_IP_ADDRESS.173:55678  ESTABLISHED
  TCP    xxx:80       CF_IP_ADDRESS.173:63352  CLOSE_WAIT
  TCP    xxx:80       CF_IP_ADDRESS.195:31634  CLOSE_WAIT
  TCP    xxx:80       CF_IP_ADDRESS.195:56504  CLOSE_WAIT
  TCP    xxx:80       CF_IP_ADDRESS.195:62466  CLOSE_WAIT
  TCP    xxx:80       CF_IP_ADDRESS.205:14264  CLOSE_WAIT
  TCP    xxx:80       CF_IP_ADDRESS.205:37858  ESTABLISHED
  TCP    xxx:80       CF_IP_ADDRESS.205:47142  CLOSE_WAIT
  TCP    xxx:80       CF_IP_ADDRESS.205:50318  CLOSE_WAIT
  TCP    xxx:80       CF_IP_ADDRESS.205:57534  CLOSE_WAIT
  TCP    xxx:80       CF_IP_ADDRESS.205:63570  ESTABLISHED
  TCP    xxx:80       CF_IP_ADDRESS.211:35054  CLOSE_WAIT
  TCP    xxx:80       CF_IP_ADDRESS.217:26940  ESTABLISHED
  TCP    xxx:80       CF_IP_ADDRESS.217:29042  CLOSE_WAIT
  TCP    xxx:80       CF_IP_ADDRESS.217:37898  CLOSE_WAIT
  TCP    xxx:80       CF_IP_ADDRESS.217:39096  CLOSE_WAIT
  TCP    xxx:80       CF_IP_ADDRESS.217:46002  CLOSE_WAIT
  TCP    xxx:80       CF_IP_ADDRESS.217:63860  CLOSE_WAIT

this is just a few lines taken from 3622 lines.

The interesting part is that from these 3622 lines, 2992 had this CLOSE_WAIT as the state.

As I said, if I ran iisreset, everything would work as normal for a few min before starting to timeout to genuine users of the app.

CF support said they couldn't see anything out of ordinary so I'm not sure if this was an attack or what.

The server is running IIS, could it be a bug somehow? Is there any attack that follows this pattern and would leave a lot of CLOSE_WAIT connections?

Any help would be really appreciated.

The server is running Windows Server 2016 and IIS 10.

Score:1
af flag

OK I will post my findings here, just in case anyone needs it.

Around 10 hours before this issue started to happen, I had ran windows update and KB5005698 was installed. This update was installed on the 2 servers that support the android app.

Weirdly enough, the issue started at the same time on both servers, that's why I initially suspected it was an attack.

When the server wasn't on high load anymore, the issue stopped and I decided to migrate the web api from .net 5 to .net 6, I installed the server bundle and deployed it.

As the issue stopped before migrating .net version, nothing had changed so I just left it there.

Around 4 hours ago, I started getting alarms again, but this time it was because the web api was returning excessive http 500, but the number of connections were normal. So I decided to revert the app to the .net 5 version.

As soon as I did that, the number of connections started to increase and reached 5k more in just a minute and the timeouts were running free! I kept running iisreset and the same pattern was happening again.

So I swapped it again to .net 6 and no more connections increase but http 500s after a while.

Turns out the http 500 was an easy code fix so I fixed it and deployed again, targeting .net 6.

So no more high connections and everything seems to be working smoothly.

So I came to the conclusion that the issue is with KB5005698 and .net 5.

Deploying the same app targeting .net 6 fixed the problem.

After thousands of bad reviews and loss of revenue, it's all back again...

Lesson learned... I will never update the server again if I don't need to.

Hope it helps someone.

Lex Li avatar
vn flag
Another rule you might add to your notes is that Microsoft puts more testing resources around long term support releases (.NET Core 3.1/.NET 6/.NET 8) than short term support releases (.NET 5/.NET 7). So to host an application in production, LTS runtime is preferred.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.