we recently started transitioning from domain binding to using nginx as a proxy for our web apps.
Requests going to the wildcard subdomain *.domain.tld
are being loadbalanced by our firewall to two linux machines (Debian 11) proxy-01
and proxy-02
that have nginx running on them, with proxy configurations for *.domain.tld
subdomains.
proxy-01
and proxy-02
both have an /etc/hosts
entry for webserver-07
.
An example config for test.domain.tld
:
upstream test {
server webserver-07:44309;
}
server {
server_name test.domain.tld;
listen 443 ssl;
location / {
proxy_pass https://test;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
client_max_body_size 0;
large_client_header_buffers 4 32k;
proxy_busy_buffers_size 512k;
proxy_buffers 4 512k;
proxy_buffer_size 256k;
proxy_read_timeout 600s;
ssl_certificate /etc/ssl/certs/_.domain.tld.crt;
ssl_certificate_key /etc/ssl/private/_.domain.tld.key;
ssl_trusted_certificate /etc/ssl/certs/Root_Cert.pem;
access_log /var/log/nginx/test.domain.tld_access.log;
error_log /var/log/nginx/test.domain.tld_error.log;
}
This setup has been up and running smoothly for the past ~6 months, until tonight, when webserver-07
lost its network connection for several hours for a reason unknown to me.
Whatever the issue was, our hardware guy got the machine connected to the network again, but even after the webserver-07
was back, trying to connect to the website on test.domain.tld
showed the nginx error page 500 Internal Server Error
and neither the proxy-01
nor the proxy-02
did log any requests to test.domain.tld_access.log
when opened with tail -f
.
However, rebooting both proxy-01
and proxy-02
fixed the issue.
We believe the upstream connection must have somehow gotten stale/corrupt, when webserver-07
opted out of the network.
Can anyone tell me what exactly caused the nginx to fail to proxy requests to the upstream even thought the upstream machine was reachable again?
Do we miss any config parameters?
How do we prevent similiar issues from occuring in the future?
Regards