Score:2

Server

My Nginx webserver does not correctly serve content with accented characters in files names

RDK

2/19/24, 3:18 PM

I just created a new web server using a Linux box and NginX. It seems to be working correctly for most content.

I just converted a site which used to be on a Windows Server box and for the most part the content and functionality (HTML and JavaScript) transferred ok. However, I have discovered that NginX seems to have problems with image names (perhaps other files also??) which contain accented characters, for example é,è,ô, etc.

If there were only a hand-full I could just manually rename the files, but there are hundreds (maybe thousands) which make a manual process unworkable. Can someone offer a way to easily rename files with these accented characters?

Thanks..RDK

edit: The original conversion of the old site had lots of pages using "windows-1252" and a few using "UTF-8" for the "charset". The later had issues displaying page content for special characters usually as the "black diamond ?" symbol. The other seemed to have issues with JavaScript and also some display issues. After a web search on that issue I change all "charset=" values to "iso-8859-1" which is the default for many browsers which corrected all of those issues. But now I have the special characters in file names problem...

617

2 + 3

linux

nginx

web-server

djdomi

2/19/24, 3:59 PM

hei, could be a [charset](https://serverfault.com/questions/312177/how-to-enable-correct-charset-http-header-in-nginx) issue? Best Practice would be, not using special characters in file-names and no spaces

Gilles Quenot

2/19/24, 7:57 PM

Default seems utf8 nowadays

Gilles Quenot

3/2/24, 7:45 PM

Have you found your answer?

Score:2

Server

Gilles Quenot

2/19/24, 6:38 PM

One way to remove diacritics with a Perl's rename:

if you need to rename diacritics to ascii equivalent:

rename -u utf8 '
    BEGIN{use Text::Undiacritic qw(undiacritic)}
    s/.*/undiacritic($&)/e
' éééé.txt 
rename(éééé.txt, eeee.txt)

One other way is to use the detox utility. Available with Debian/Ubuntu and other distros as a package.

Another last way is to use this script, based on convmv(1) translated in English from a French project : forum.ubuntu-fr.org:

It's intended to change wrong charset to utf8. (Not a script of mine, but Lapogne71), could be an issue solver.

#!/bin/bash

VERSION="v0.04"

#---------------------------------------------
# This script allows to loop the "convmv" utility that allows converting file names coded in
# something other than UTF-8 to UTF-8
# Restart the script with the ALLCODES argument if no result has been found
#---------------------------------------------

# here are the colors of the text displayed in the shell
RED="\\033[1;31m"
NORMAL="\\033[0;39m"
BLUE="\\033[1;36m"
GREEN="\\033[1;32m"

echo
echo -e "$GREEN $0 $NORMAL  $VERSION"
echo

echo "---------------------------------------------
This script allows to loop the 'convmv' utility that allows converting file names coded in
something other than UTF-8 to UTF-8. Restart the script with the ALLCODES argument if no result
has been found.
---------------------------------------------"

# The main loop launches convmv tests to "visually" detect the original encoding
# We only loop over the iso-8859* and cp* code families as they are the most likely ones (EBCDIC codes have also been removed from the list)

CODES_LIST="
iso-8859-1
iso-8859-2
iso-8859-3
iso-8859-4
iso-8859-5
iso-8859-6
iso-8859-7
iso-8859-8
iso-8859-9
iso-8859-10
iso-8859-11
iso-8859-13
iso-8859-14
iso-8859-15
iso-8859-16
cp437
cp737
cp775
cp850
cp852
cp855
cp856
cp857
cp860
cp861
cp862
cp863
cp864
cp865
cp866
cp869
cp874
cp932
cp936
cp949
cp950
cp1250
cp1251
cp1252
cp1253
cp1254
cp1255
cp1256
cp1257
cp1258
"

# We check if the convmv utility is installed
path=`which convmv 2> /dev/null`
if [ -z "$path" ]; then
    echo -e "$RED ERROR: convmv is not installed, please install it by typing:"
    echo
    echo -e "$BLUE    sudo apt-get install convmv "
    echo
    echo -e "$RED ==> program exit"
    echo
    echo -e "$NORMAL"
    exit 1
fi

# To loop over all the codepages supported by convmv, the ALLCODES argument must be provided
if [ "$1" = "ALLCODES" ]; then
    CODES_LIST=`convmv --list`
    echo
    echo -e "$RED Check which original encoding seems correct (press 'y' and validate if waiting for display)$NORMAL"
    echo
fi

# Main loop of the program
for CODAGE in $CODES_LIST; do
    echo -e "$BLUE--- Encoding hypothesis: $RED $CODAGE $BLUE---$NORMAL"
    echo
    # echo -e "$RED Press 'y' and validate if no list is displayed $NORMAL"
    convmv -f $CODAGE -t utf-8 -r * 2>&1 | grep -v Perl | grep -v Starting | grep -v notest | grep -v Skipping > /tmp/affichage_convmv.txt
    NOMBRE_FICHIERS=`cat /tmp/affichage_convmv.txt | wc -l`
    if [ $NOMBRE_FICHIERS -eq 0 ]; then
        echo
        echo -e "$RED No filename to convert " $NORMAL
        echo
        echo -e "$BLUE Exiting program ... $NORMAL"
        echo
        rm /tmp/affichage_convmv.txt 2>/dev/null
        exit 0
    fi

    # sed 's ..  ' source.txt   ==> this removes the first 2 characters from a string
    echo -e $GREEN "Original filenames coded in $CODAGE: " $NORMAL
    # ALTERNATIVE cat /tmp/affichage_convmv.txt | cut -f 2 -d '"' | sed 's ..  '
    cat /tmp/affichage_convmv.txt | cut -f 2 -d '"'
    echo
    echo -e $GREEN "Filenames converted to UTF-8: " $NORMAL
    # ALTERNATIVE cat /tmp/affichage_convmv.txt | cut -f 4 -d '"' | sed 's ..  '
    cat /tmp/affichage_convmv.txt | cut -f 4 -d '"'
    echo

    echo -n -e $GREEN "Found encoding? $RED [N]$NORMAL""on /$RED o$NORMAL""ui /$RED q$NORMAL""uit: "
    read confirm
    echo

    # request for file conversion using convmv
    if [ "$confirm" = O ] || [ "$confirm" = o ];then
        echo -e "$BLUE Convert filenames now from encoding $CODAGE? $NORMAL"
        echo -e "$BLUE   ==> convmv -f $CODAGE -t utf-8 * --notest $NORMAL"
        echo -n -e $GREEN "Confirm conversion $RED [N]$NORMAL""on /$RED o$NORMAL""ui /$RED r$NORMAL""ecursive: "
        read confirm
        echo

        case $confirm in
            O|o)    convmv -f $CODAGE -t utf-8 * --notest 2>/dev/null
                echo
                echo -e "$BLUE File name conversion done... $NORMAL" ;;
            R|r)    convmv -f $CODAGE -t utf-8 * -r --notest 2>/dev/null
                echo
                echo -e "$BLUE Recursive file name conversion done... $NORMAL" ;;
            *)      echo -e "$BLUE Exiting program... $NORMAL" ;;
        esac

        echo
        rm /tmp/affichage_convmv.txt 2>/dev/null
        exit 0

    # request for program exit
elif [ "$confirm" = Q ] || [ "$confirm" = q ];then
    echo -e "$BLUE Exiting program... $NORMAL"
    echo
    rm /tmp/affichage_convmv.txt 2>/dev/null
    exit 0
    fi
    clear
done
rm /tmp/affichage_convmv.txt 2>/dev/null

+ 2

RDK

2/19/24, 7:48 PM

Hmmm, the original conversion had lots of pages using "windows-1252" and a few using "UTF-8" for the "charset". The later had issues displaying page content for special characters usually as the "black diamond ?" symbol. The other seemed to have issues with JavaScript and also some display issues. After a web search on that issue I change all "charset=" values to "iso-8859-1" which corrected all of those issues. But now I have the special characters in file names problem... I'll update my question with this information.

n0099

2/20/24, 6:12 AM

The "black diamond ?" you called is accurately [Unicode replacement character](https://en.wikipedia.org/wiki/Specials_(Unicode_block)#Replacement_character)

Score:1

Server

degenerate

2/19/24, 8:03 PM

You need to open each file with Notepad++, Sublime, VSCode, or some other text editor that supports switching the charset. Switch the charset to UTF-8 and then save the file. If you are editing the files directly in linux, you might consider using iconv to convert each file to UTF-8 encoding.

Then after you've converted all your text based files to UTF-8 charset, test out nginx and the characters should display. If not, you can also try (in nginx.conf or whichever .conf file has your server configuration) add this line:

charset UTF-8;

The files and webserver must be the same charset, so converting everything to UTF-8 is the simplest way to avoid these issues in the future.

+ 0

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: My Nginx webserver does not correctly serve content with accented characters in files names

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.