Score:0

Possible causes for Apache not responding on port 443

tr flag

Background: Debian Stretch amd64 server on Google Cloud with Apache 2.4.25. It's running a PHP-based website via proxy_fcgi to PHP-FPM. Backend database is PostgreSQL 10. Postgres packages have been installed from the official Postgres apt repo, everything else is vanilla from the Debian repos. There's a port 80 redirect to 443 with Let's Encrypt certificates. HTTP/2 and Brotli are enabled. There is also a reverse proxy to a Server-Sent Event daemon on the same server (https://github.com/vgno/ssehub).

Server has been up for over 2 years, but in the last few months there is an intermittent fault where the site stops responding to requests. It usually clears up after a couple of minutes. I've done a lot of log analysis, and it doesn't seem to be related to the server processes. CPU usage is nominal, memory usage is low, no errors appear in logs for Apache, PostgreSQL, FPM, syslog, ssehub. The server also has fail2ban installed but there are no log entries for that either. I've put in extra diagnostic logging in Apache and FPM to check for requests that take a long time to process, but that hasn't turned anything up.

Here's the output from iptables -L:

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
f2b-sshd   tcp  --  anywhere             anywhere             multiport dports ssh
DROP       udp  --  anywhere             anywhere             udp dpt:l2f policy match dir in pol none
DROP       all  --  anywhere             anywhere             ctstate INVALID
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
ACCEPT     udp  --  anywhere             anywhere             multiport dports isakmp,ipsec-nat-t
ACCEPT     udp  --  anywhere             anywhere             udp dpt:l2f policy match dir in pol ipsec
DROP       udp  --  anywhere             anywhere             udp dpt:l2f

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         
DROP       all  --  anywhere             anywhere             ctstate INVALID
ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
ACCEPT     all  --  anywhere             anywhere            
ACCEPT     all  --  192.168.42.0/24      192.168.42.0/24     
ACCEPT     all  --  anywhere             192.168.43.0/24      ctstate RELATED,ESTABLISHED
ACCEPT     all  --  192.168.43.0/24      anywhere            
DROP       all  --  anywhere             anywhere            

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination         

Chain f2b-sshd (1 references)
target     prot opt source               destination         
RETURN     all  --  anywhere             anywhere            

Any suggestions for possible causes or things I should check? At the moment the only cause I can think of is network congestion, but that's very difficult to prove as it's an intermittent issue and usually clears up by the time I'm aware of it and start doing some tests. Plus it seems surprising that Google Cloud would have such frequent network issues. Do Google have some kind of traffic shaping policies that I'm not aware of? It's a very low traffic server and the problem frequently occurs out of hours when virtually no one is using the site.

Jyothi Kiranmayi avatar
at flag
1. When did the issue started? 2. Have you did any modifications to your VM instance (eg. Installed an application, configuring the firewalls, etc.)? Run the following command and share the screenshot of result and verify if port 443 is listening(you should see "Listen 443"). $ cat /etc/apache2/ports.conf
Kitserve avatar
tr flag
No changes have been made to the server configuration or the website code. The server is listening on port 443 - as I said, this is an intermittent fault. Most of the time the site responds normally. I'm not sure exactly when the issue started but it was first reported about 6 months ago.
Bakul Mitra avatar
cn flag
Can you check whenever an intermittent issue happens , Are you able to connect your application service(website) from another VM instance within the same network? Are you using any load balancer behind the VM instance?
Kitserve avatar
tr flag
There is no load balancer. Next time it happens, if I manage to catch it while the problem is ongoing, I'll have a go at trying to connect from a few different locations. Not sure if any of my other VMs are in the exact same network, but I'll check.
Abhijith Chitrapu avatar
tr flag
@Kitserve have you checked and your problem is resolved?
Kitserve avatar
tr flag
No, it's not resolved. I've lined up a few tests to try when the problem next occurs, mainly tcptraceroute as described at https://support.opendns.com/hc/en-us/articles/227989007-How-to-Running-a-TCP-Traceroute. Since I posted the original question, the problem hasn't reappeared so I don't have any more diagnostic information to share.
Srividya avatar
cn flag
To isolate further, please provide me with the following information: Please check if the apache server is responding (e.g., is apache running?) Are you able to connect to port 443? Is the certificate up to date? Are you able to connect over SSL/TLS with openssl s_client -connect localhost:443 Can you share with me the tcpdump and apache logs during the time of issue.
Kitserve avatar
tr flag
I appreciate the responses, but I thought I'd said quite clearly in the original issue, this is an **intermittent** issue. Normally Apache responds as normal. Port 443 is open, the certificate is correctly configured, and the site works as expected *almost* 100% of the time. If I was able to catch the problem while it was occurring and run some tests, I would have a much better idea of what was going on. My question is whether anyone has any suggestions of possible causes for Apache to randomly stop responding to requests without any log entries appearing. Thanks.
Jyothi Kiranmayi avatar
at flag
There is a possibility of memory issues stopping apache from responding to requests(http/https). Also, check settings in configuration file(httpd.conf) : **AcceptFilter http none, AcceptFilter https none** . I'd suggest next time you start it, start a trace running so that after it dies you can investigate what calls were happening last before it failed. You can use the following command after you start it to make sure you attach to the master process and all its children and any new ones that get forked.
Jyothi Kiranmayi avatar
at flag
pidlist=''; for pid in `ps ax | grep httpd | awk '{print $1}'`; do pidlist="$pidlist -p $pid"; done; strace -tt -F -f $pidlist 2>&1 |tee /root/apache_strace.out if the Apache process is called httpd or something else (like apache or apache2), but if it's not httpd, then swap the correct name into the command above.
Kitserve avatar
tr flag
Thanks, I wasn't aware of AcceptFilter, there aren't currently any references to that in the config so I'll check that out, and strace too. I considered memory issues, but if that was happening I'd expect to see something in the munin graphs, and also some reference to the OOM Killer in the error log. Server has 4GB of RAM and usually runs at less than 25% of that, so it would have to be a pretty sharp memory usage spike not to show any hint whatsoever.
Abhijith Chitrapu avatar
tr flag
@Kitserve have you checked and your problem is resolved?
Kitserve avatar
tr flag
The problem hasn't reappeared since I opened this question, so it's not resolved as I haven't got any new diagnostic information to work with. I will update or close this question if anything changes.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.