Score:0

Failing to proxy arbitrary URLs with NGINX

jp flag

I've tried multiple solutions to proxy arbitrary URLs through NGINX. By that, I mean requesting http://myhost/proxy/http://someurl.com?whatever=foo and getting http://someurl.com?whatever=foo served. I need this to add CORS headers to the response. One solution I've tried is this, the other one I found in this answer, and both return a 404 instead of the content I need proxied.

Here is my (standard vanilla) config with the proposed solution from the second answer, under server -> location :


#user  nobody;
worker_processes  1;

#error_log  logs/error.log;
#error_log  logs/error.log  notice;
#error_log  logs/error.log  info;

#pid        logs/nginx.pid;


events {
    worker_connections  1024;
}


http {
    include       mime.types;
    default_type  application/octet-stream;

    #log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
    #                  '$status $body_bytes_sent "$http_referer" '
    #                  '"$http_user_agent" "$http_x_forwarded_for"';

    #access_log  logs/access.log  main;

    sendfile        on;
    #tcp_nopush     on;

    #keepalive_timeout  0;
    keepalive_timeout  65;

    #gzip  on;

    server {
        listen       8080;
        server_name  localhost;

        #charset koi8-r;

        #access_log  logs/host.access.log  main;

    location ~ /proxy/\?url=(.*)$ {
            proxy_pass $1;
            proxy_set_header Host $host;
    }   

    location ~ /proxy/https\:\/\/(.*)$ {
            proxy_pass $1;
            proxy_set_header Host $host;
    }   

        location / {
            root   html;
            index  index.html index.htm;
        }

        #error_page  404              /404.html;

        # redirect server error pages to the static page /50x.html
        #
        error_page   500 502 503 504  /50x.html;
        location = /50x.html {
            root   html;
        }

        # proxy the PHP scripts to Apache listening on 127.0.0.1:80
        #
        #location ~ \.php$ {
        #    proxy_pass   http://127.0.0.1;
        #}

        # pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000
        #
        #location ~ \.php$ {
        #    root           html;
        #    fastcgi_pass   127.0.0.1:9000;
        #    fastcgi_index  index.php;
        #    fastcgi_param  SCRIPT_FILENAME  /scripts$fastcgi_script_name;
        #    include        fastcgi_params;
        #}

        # deny access to .htaccess files, if Apache's document root
        # concurs with nginx's one
        #
        #location ~ /\.ht {
        #    deny  all;
        #}
    }


    # another virtual host using mix of IP-, name-, and port-based configuration
    #
    #server {
    #    listen       8000;
    #    listen       somename:8080;
    #    server_name  somename  alias  another.alias;

    #    location / {
    #        root   html;
    #        index  index.html index.htm;
    #    }
    #}


    # HTTPS server
    #
    #server {
    #    listen       443 ssl;
    #    server_name  localhost;

    #    ssl_certificate      cert.pem;
    #    ssl_certificate_key  cert.key;

    #    ssl_session_cache    shared:SSL:1m;
    #    ssl_session_timeout  5m;

    #    ssl_ciphers  HIGH:!aNULL:!MD5;
    #    ssl_prefer_server_ciphers  on;

    #    location / {
    #        root   html;
    #        index  index.html index.htm;
    #    }
    #}
    include servers/*;
}

Edit: I've tried to match the location with ~* /proxy/(?<pschema>https?)://(?<phost>[\w.]+)(?<puri>\/.*) and according to the nginx config checker it should match http://localhost:8080/proxy/http://google.com/ , but it just doesn't work in my config.

Score:1
us flag

nginx matches only the path part of URL in location directive.

Therefore

location ~ /proxy/\?url=(.*)$ {

does not match anything, since you are trying to match the query string. You need to work with $arg_url variable to get the content of url query argument.

For the second location, your proxy_pass destination is missing the protocol.

location ~ /proxy/https\:\/\/(.*)$ {
        proxy_pass $1;
        proxy_set_header Host $host;
}   

Your regular expression captures the domain and path part of URL. However, proxy_pass requires a full URL, which must include the protocol.

Nikita Fuchs avatar
jp flag
Thanks, I've adjusted both to https://pastebin.com/r93aeY6K and ` location /proxyurl/ { resolver 8.8.8.8; proxy_pass $arg_url; proxy_set_header Host $host; } ` but in the first case nginx is simply looking for some local resource, and in the second case, it's only a 404. although the config check at https://nginx.viraptor.info/ says they both match the corresponding request URLs.
Score:-1
jp flag

The solution is the following (read the explanation after that) :

    location ~* /proxy/(?<pschema>https?):/(?<phost>[\w.]+)(?<puri>\/.*)    {
        set $adr $pschema://$phost;
        rewrite .* $puri break;
        resolver 8.8.8.8;
        proxy_pass $adr;
        add_header X-debug-message "adr: $adr" always;
        add_header X-debug-message "puri: $puri" always;
        add_header X-debug-message "pschema: $pschema" always;
        add_header X-debug-message "phost: $phost" always;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $phost;
        proxy_set_header X-NginX-Proxy true;
        proxy_redirect off;
        proxy_connect_timeout 1;
        proxy_intercept_errors on;
        expires 30;
    }

The original regex expects a :// after http(s). Nginx silently cuts away one / in the URI, therefore it couldn't match. Also, sometimes redirects can happen instead of proxying. But that is done by the service you are requesting. In this case, trying to proxy google like curl -vv http://localhost:8080/proxy/http://google.com/ will lead to google trying to redirect you to www.google.com . But if you try curl -vv http://localhost:8080/proxy/http://www.google.com/ , you will be proxied.

This approach doesn't work for loading entire websites though, as their resources (images, .js, .css etc.) are linked relatively, so the browser will try fetching it from your server. But proxying individual files works fine, which is what my goal was.

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.