Score:1

Using URL's with special characters in nginx maps

in flag

When using nginx and maps it is possible to rewrite mutiple URL's with a map file. What is problematic is when the URL contains special characters. I have been breaking my head trying to get this right, and hope this Question / Solution might save others from becoming gray hair.

Let's set the scenario.

A Linux server (Debian/Ubuntu) running standard nginx. DNS pointing to this server that resolves to a server config. A Map that contains no duplicate entries with incoming and outgoing URL's (resolvable)

The map setup would contain the following:

map $host$request_uri $rewrite_uri {
    include /<path to file filename>;
}

the map file itself contains one entry per line terminated with a semicolon.

example.com/Böhme https://anotherexample.org/SomeWeirdPath/Böhme;

The server config for this mapping to work

server {
    listen 443 ssl http2;
    ssl_certificate /<absolute path to crt file>;
    ssl_certificate_key /<absolute path to key file>;
    server_name example.com;
    proxy_set_header X-Forwarded-For $remote_addr;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers HIGH:!aNULL:!MD5;
    ssl_dhparam <absolute path to Diffie Hellman key>;
    add_header Strict-Transport-Security "max-age=31536000; includeSubDomains";
    server_tokens off;
    if ($rewrite_uri) {
            rewrite ^ $rewrite_uri redirect;
    }
    rewrite ^ <default URL> redirect;
}

I have simplified the config of this server config so we can concentrate on the map settings. The config assume that the domain will be using SSL and the certificate is valid. The if statement will only execute if the $host$request_uri is in the list with a $rewrite_uri, otherwise the last rewrite will be executed.

The Question

How do I transform the $request_uri so that nginx understand it correctly? The map file contains the value in UTF8, but it seems that nginx wants the $request_uri URL-Encoded and in Hexadecimal.

$request_uri as in the mapfile

example.com/Böhme

$request_uri URLEncoded as per Browser

example.com/B%C3%B6hme

$request_uri as I think nginx wants it

example.com/B\xC3\xB6hme

I can't seem to find a system package that has this feature, but I think I am starting to re-invent the wheel here.

I would need to:

create a function that will URL encoding the list, as per How to decode URL-encoded string in shell?

function urldecode() { local i="${*//+/ }"; echo -e "${i//%/\\x}"; }

and then use Octal dump as per Convert string to hexadecimal on command line, so the map bucket is created in memory with the correct values for the if statement test.

It's starting to feel like rocket science, and I can't believe that nobody else hasn't solved this problem before, I just can't seem to find a solution.

Ivan Shatsky avatar
gr flag
Check [Matching non ASCII characters in NGiNX location](https://blog.rabin.io/quick-tip/matching-non-ascii-characters-in-nginx-location) article and the links it is refer to.
us flag
I think you are misuising the term URN here. You should use term URI here. URN is a globally unique identifier for a resource, and the path component of URL is not globally unique.
in flag
@TeroKilkanen point taken, topic changed as requested... still can't solve it though, but I am not giving up. It's going on the backburner for now...
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.