I have a single server hosting several Rails sites. All sites are identified by unique host names (all domain names map to the same IP address). Both Rails and nginx are running in Docker containers. I am using nginx 1.23.1 running in a Docker image built from the official Docker image (I only added certbot for TLS certificate processing).
After recently adding another site, a very strange thing started happening. Right after I start nginx, everything works as expected: all the content returned matches the host name in the request. But after a few hours, the content returned does not match the requested host name. But this only affects the proxy content; all static resources are still served correctly based on the host name.
For example, when I request https://www.meaaa.com (all domains here are examples, not real domains), I get the HTML content from bbb.me.com. And since the content from bbb.me.com asks for styles and images that it expects to find in bbb.me.com, the server responds to all those requests with 404 (because static assets are served from the www.meaaa.com files, since the request host name is www.meaaa.com).
And if I request https://bbb.me.com, I get the HTML content from www.meaaa.com. Again, the assets specified in the markup are expected to come from in www.meaaa.com, but since static assets are fetched correctly according to the host name bbb.me.com in the request, they're not found.
So the upstream Rails content from the two sites seems to have traded places, while the static assets are served correctly.
I have been using non-Docker nginx for years with multiple Rails sites, and have never seen this happen. It's not a question of requesting an undefined host; both hosts are declared in the configuration. If one host stopped being recognized, then I could assume that the content returned was just the default server, but in fact they are both recognized, just swapped. The fact that only the proxy content is switched and not the static assets shows that both host names are being recognized.
To summarize the symptoms, here is what curl shows:
$ curl -s https://www.meaaa.com | grep '<title>'
<title>BBB Site</title>
$ curl -s https://bbb.me.com | grep '<title>'
<title>MEAAA Site</title>
Requesting www.meaaa.com static assets individually using the www.meaaa.com host name works fine, as does requesting bbb.me.com assets from bbb.me.com.
I also made sure the problem is not Docker. Inside the nginx container, I can curl each back-end and get the right content:
$ curl -s http://aaa:3000 | grep '<title>'
<title>MEAAA Site</title>
$ curl -s http://bbb:3000 | grep '<title>'
<title>BBB Site</title>
Here is the config for the www.meaaa.com site:
upstream aaa-rails {
server aaa:3000;
}
server {
server_name www.meaaa.com source.meaaa.com aaa.meinternal.com;
root /var/www/aaa-rails/public;
index index.html index.htm;
location /cms {
deny 172.22.188.2; # public network interface
try_files $uri @app;
}
location / {
try_files $uri/index.html $uri @app;
}
location @app {
proxy_pass http://aaa-rails;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_set_header Origin $scheme://$http_host;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Forwarded-Port $server_port;
proxy_set_header X-Forwarded-Host $host;
proxy_redirect off;
}
location ~* ~/assets/ {
try_files $uri @app;
}
listen 80;
listen 443 ssl; # managed by Certbot
ssl_certificate /etc/letsencrypt/live/www.meaaa.com/fullchain.pem; # managed by Certbot
ssl_certificate_key /etc/letsencrypt/live/www.meaaa.com/privkey.pem; # managed by Certbot
include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
}
And here is the config for the bbb.me.com site:
upstream bbb-rails {
server bbb:3000;
}
server {
server_name bbb.me.com bbb-source.me.com bbb.meinternal.com;
root /var/www/bbb-rails/public;
index index.html index.htm;
client_max_body_size 50m;
location / {
try_files $uri @app;
}
location @app {
proxy_pass http://bbb-rails;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_set_header Origin $scheme://$http_host;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Forwarded-Port $server_port;
proxy_set_header X-Forwarded-Host $host;
proxy_redirect off;
}
location ~* ~/assets/ {
try_files $uri @app;
}
listen 80;
listen 443 ssl; # managed by Certbot
ssl_certificate /etc/letsencrypt/live/bbb.me.com/fullchain.pem; # managed by Certbot
ssl_certificate_key /etc/letsencrypt/live/bbb.me.com/privkey.pem; # managed by Certbot
include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
}
To me the strangest thing is that restarting nginx fixes the problem, but only temporarily. I don't think there is any caching going on, and I don't see any errors in the nginx logs. Any suggestions as to what to look at would be most appreciated.
UPDATE
The two sites that changed places were the ones that were using HTTPS. I modified several others to use HTTPS, and now three of them are wrong in a kind of round-robin: ask for aaa, get bbb; ask for bbb, get ccc; ask for ccc, get aaa. Two other sites just don't respond. It's as if some unpredictable event triggers nginx to corrupt whatever routing tables it uses for serving proxy content.
For now, since this is a production server, I am restarting nginx every 60 minutes. I am trying to set up a staging server as a duplicate of the production server, hoping that the same problem will surface there so that I can try to figure out the problem without bringing down the sites.