Score:1

nginx serving content from wrong proxy

zw flag
EK0

I have a single server hosting several Rails sites. All sites are identified by unique host names (all domain names map to the same IP address). Both Rails and nginx are running in Docker containers. I am using nginx 1.23.1 running in a Docker image built from the official Docker image (I only added certbot for TLS certificate processing).

After recently adding another site, a very strange thing started happening. Right after I start nginx, everything works as expected: all the content returned matches the host name in the request. But after a few hours, the content returned does not match the requested host name. But this only affects the proxy content; all static resources are still served correctly based on the host name.

For example, when I request https://www.meaaa.com (all domains here are examples, not real domains), I get the HTML content from bbb.me.com. And since the content from bbb.me.com asks for styles and images that it expects to find in bbb.me.com, the server responds to all those requests with 404 (because static assets are served from the www.meaaa.com files, since the request host name is www.meaaa.com).

And if I request https://bbb.me.com, I get the HTML content from www.meaaa.com. Again, the assets specified in the markup are expected to come from in www.meaaa.com, but since static assets are fetched correctly according to the host name bbb.me.com in the request, they're not found.

So the upstream Rails content from the two sites seems to have traded places, while the static assets are served correctly.

I have been using non-Docker nginx for years with multiple Rails sites, and have never seen this happen. It's not a question of requesting an undefined host; both hosts are declared in the configuration. If one host stopped being recognized, then I could assume that the content returned was just the default server, but in fact they are both recognized, just swapped. The fact that only the proxy content is switched and not the static assets shows that both host names are being recognized.

To summarize the symptoms, here is what curl shows:

$ curl -s https://www.meaaa.com | grep '<title>'
  <title>BBB Site</title>

$ curl -s https://bbb.me.com | grep '<title>'
  <title>MEAAA Site</title>

Requesting www.meaaa.com static assets individually using the www.meaaa.com host name works fine, as does requesting bbb.me.com assets from bbb.me.com.

I also made sure the problem is not Docker. Inside the nginx container, I can curl each back-end and get the right content:

$ curl -s http://aaa:3000 | grep '<title>'
  <title>MEAAA Site</title>

$ curl -s http://bbb:3000 | grep '<title>'
  <title>BBB Site</title>

Here is the config for the www.meaaa.com site:

upstream aaa-rails {
    server aaa:3000;
}

server {
    server_name www.meaaa.com source.meaaa.com aaa.meinternal.com;
    root /var/www/aaa-rails/public;
    index index.html index.htm;

    location /cms {
      deny 172.22.188.2; # public network interface
      try_files $uri @app;
    }
    location / {
        try_files $uri/index.html $uri @app;
    }

    location @app {
        proxy_pass http://aaa-rails;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $http_host;
        proxy_set_header Origin $scheme://$http_host;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header X-Forwarded-Port $server_port;
        proxy_set_header X-Forwarded-Host $host;
        proxy_redirect off;
    }

    location ~* ~/assets/ {
        try_files $uri @app;
    }

    listen 80;

    listen 443 ssl; # managed by Certbot
    ssl_certificate /etc/letsencrypt/live/www.meaaa.com/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/www.meaaa.com/privkey.pem; # managed by Certbot
    include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
}

And here is the config for the bbb.me.com site:

upstream bbb-rails {
    server bbb:3000;
}

server {
    server_name bbb.me.com bbb-source.me.com bbb.meinternal.com;
    root /var/www/bbb-rails/public;
    index index.html index.htm;

    client_max_body_size 50m;

    location / {
        try_files $uri @app;
    }

    location @app {
        proxy_pass http://bbb-rails;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $http_host;
        proxy_set_header Origin $scheme://$http_host;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header X-Forwarded-Port $server_port;
        proxy_set_header X-Forwarded-Host $host;
        proxy_redirect off;
    }

    location ~* ~/assets/ {
        try_files $uri @app;
    }

    listen 80;

    listen 443 ssl; # managed by Certbot
    ssl_certificate /etc/letsencrypt/live/bbb.me.com/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/bbb.me.com/privkey.pem; # managed by Certbot
    include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
    ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
}

To me the strangest thing is that restarting nginx fixes the problem, but only temporarily. I don't think there is any caching going on, and I don't see any errors in the nginx logs. Any suggestions as to what to look at would be most appreciated.

UPDATE

The two sites that changed places were the ones that were using HTTPS. I modified several others to use HTTPS, and now three of them are wrong in a kind of round-robin: ask for aaa, get bbb; ask for bbb, get ccc; ask for ccc, get aaa. Two other sites just don't respond. It's as if some unpredictable event triggers nginx to corrupt whatever routing tables it uses for serving proxy content.

For now, since this is a production server, I am restarting nginx every 60 minutes. I am trying to set up a staging server as a duplicate of the production server, hoping that the same problem will surface there so that I can try to figure out the problem without bringing down the sites.

ajmeese7 avatar
cn flag
Where do you have the `bbb` and `aaa` used in the proxies defined?
EK0 avatar
zw flag
EK0
Not sure I understand the question. The proxy refers to the upstream block in the config, which points to a Docker container running Rails.
ajmeese7 avatar
cn flag
Yes, in the `upstream` blocks you refer to `bbb` and `aaa`. Where are those defined to point to the Docker containers?
EK0 avatar
zw flag
EK0
aaa and bbb are the service names in the docker-compose.yml files. They serve as host names in the Docker network, so in a container in the same network you can `ping aaa`, `curl http://aaa:3000`, etc.
Score:0
zw flag
EK0

It turns out that the problem stemmed from the fact that nginx only resolves each back-end name once (when it loads its configuration), and after that assumes that the IP address will never change. Thanks to Ángel over at the nginx mailing list, who pointed this out.

In my case, since each back end is a web application, I have to run logrotate daily, so that if the logs get large enough, the log files will be rotated and compresssed, and the corresponding Rails app will be restarted.

It may easily happen that two or more Rails apps are restarted more or less simultaneously, but it's not guaranteed that they'll be restarted in precisely the right order for them to get their previous IP address back in the Docker internal network. That explains what was happening: every once in a while, Rails apps were exchanging IP addresses, but nginx did not know about it, so it continued forwarding requests to the old IP addresses, and at the same time serving static assets correctly. I simulated this by stopping and restarting two applications, and was able to reproduce the problem exactly. Also, I had added a script that would monitor the sites every five minutes, and the logs showed that the problem always seemed to happen around the same time, which was consistent with logrotate being the trigger.

Ángel pointed out that, instead of restarting nginx, it is possible simply to make it reload the configuration, which should result in less of an interruption for visitors to the sites. He also pointed out that it is possible to force nginx to look up names every few minutes instead of only once (see https://forum.nginx.org/read.php?2,215830,215832#msg-215832).

To resolve the problem, at least for now I've configured Docker to assign a fixed IP address to all containers. In that way, nginx can continue to resolve names only once. This has some consequences, however, since now I cannot run commands such as db:migrate or assets:precompile using the same docker-compose.yml file as the running Rails app (you get an "address is in use" error); for now, I am using docker compose exec instead of run, but this seems to have an effect on the docker compose restart command (you get "address is in use" for a few seconds after running the exec). If this becomes a problem I may revert to non-fixed IP addresses and make nginx reload its configuration as part of the logrotate process.

EK0 avatar
zw flag
EK0
I think assigning fixed IP addresses to Docker containers is not really the way to go. Besides the odd container restart behavior mentioned above, assigning IP addresses requires knowing what virtual networks are in use on each server, so moving to another server might mean having to change all IP addresses in the configuration files. Instead, I've opted to use Unix sockets for communication between nginx and the Rails back ends. There is thus no question of name resolution, and you can run as many commands using the same docker-compose.yml file as you need without IP address conflicts.
mwieczorek avatar
vn flag
I have the opposite problem. When I run `nginx -s reload` after compose up, the sites are switched. Only compose restart on the nginx process solves the issue.
EK0 avatar
zw flag
EK0
Are you sure it is nginx? The IP addresses change when the containers behind nginx are restarted and the first available IP is assigned by Docker. In any case, if you use Unix sockets, that eliminates this entire problem as nginx talks to the other containers through fixed locations in the file system instead of through the network.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.