Score:0

Tracing / Solving a sudden spike in Apache2

la flag

My server is running on Ubuntu20.04, a pure LAMP stack with Apache 2.4.41. In the last few weeks, there was a total of 2 occurrence where Apache2 was not responsive (users can't load our website), and we can't solve why, but it started working again after I restarted Apache2 (systemctl restart apache2). I checked and MySQL is up, so I feel it's purely due to Apache2 reaching the limit and being unresponsive.

So I started tracing around, and logging the processes count, namely, logging down the command below

ps aux | grep apache | wc -l

into a text file every 5 seconds.

The command will return the number of processes that has the word "apache", which serves a purpose to tell us the amount of active processes currently.

The usual process counts would range from 90 (off peak) to 250-300 (peak). But occasionally (twice now, since we started logging), it goes up to 700, the trend will be from 90 > 180 > 400 > 700, nearly doubling every 5 seconds.

I have checked apache error logs, syslogs, access logs and so on, and failed to find any useful informations. Initially I suspects it to be a DDOS, but i fail to find any useful information to "prove" that it is DDOS.

Little info about my server configs -

  • uses the default mpm_prefork
  • MaxKeepAliveRequest 100
  • KeepAliveTimeout 5
  • ServerLimit 1000
  • MaxRequestWorkers 1000 (increased recently to "solve" the spike, it was 600 previously)
  • MaxConnectionsPerChild 0
  • MaxSpareServers 10
  • No firewall (ufw) or mod_evasive enabled.

Here comes my questions,

  1. Is there any way I can find out what is causing the spike, if there's no logs at all? I feel that it's due to certain apache processes getting stuck and kept on spawning child processes, if that's how it works (sorry, not very familiar with server stuffs).

  2. I noticed that, after a spike, the number of processes doesn't goes down immediately, instead, it seemed like it decreases by 3-5 processes every 5 seconds, and took around 9-10 minutes to reach 100 processes, from 700 processes, not sure what was the reason, but which config should I tweak to make the processes "die" faster? I was hoping that, if the processes "die" fast enough, even if there is a sudden spike, my server will just be "down" for around 5-10seconds max. But upon reading some stuffs, my setting of KeepAliveTimeout 5 should kill it fast enough, why is it lingering for up to 10 minutes? Should i set MaxConnectionsPerChild to something other than 0 (unlimited)?

My current approach is hopefully to find ways to implement #2 and ways to "prove" that processes are dying faster than it used to be, during a spike. Secondly, maybe implement a firewall to prevent a DDOS, if it really is one.

Thanks

HBruijn avatar
in flag
You don't mention what [mpm](https://serverfault.com/q/383526/37681) you're using, but be aware that the typical default, prefork, isn't the best when you get many concurrent requests
Patrick Teng avatar
la flag
@HBruijn Sorry missed that out, but yes, I am using the default mpm_prefork.conf, I did not do much configurations on this server, as mainly I just used it as given by aws and to host my website. I just tweak as and when needed (like now)
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.