Score:1

Ubuntu

Is there a command-line method for converting UTF-8 values into Unicode values?

Ray Butterworth

6/18/24, 3:27 AM

$ od -t x1 <inputfile produces … 0a e2 8c a5 0a ….

The 0a are linefeeds, and e2 8c a5 are the UTF-8 representation of a Unicode character.

For this simple case, I can do it by hand:

[1110]0010  [10]001100  [10]100101  =  10  0011  0010  0101 = 2 3 2 5

What shell command line can convert e2 8c a5 or e28ca5 into 2325?

(For completeness, converting the other way would be good to know too.)

120

3 + 7

command-line

unicode

utf-8

format-conversion

Hannu

6/18/24, 6:50 AM

https://stackoverflow.com/a/147756

FedKad

6/18/24, 7:35 AM

See https://manpages.ubuntu.com/manpages/en/man1/iconv.1.html

Ray Butterworth

6/18/24, 11:45 AM

@Hannu, that question is about Python, not about the command-line shell.

Ray Butterworth

6/18/24, 11:56 AM

@FedKad, `echo "e28ca5" | iconv -f UTF-8 -t UNICODE` produces `��e28ca5`, not `2325`. (Actually, it produces `ff fe 65 00 32 00 38 00 63 …`, which looks like UTF-16 for the individual characters.)

sudodus

6/18/24, 1:25 PM

Try `<<<'e28ca5' xxd -r -p | iconv -t unicode | hexdump`; See also the output of `<<<'e28ca5' xxd -r -p`; I get that special character in the gnome-terminal window).

Hannu

6/18/24, 3:42 PM

@RayButterworth and the Python code does exactly what you're asking about, I'd say; learn Python - your will be up running faster than you can believe... Further down, had you read it, `iconv` is mentioned.

Ray Butterworth

6/18/24, 6:17 PM

@Hannu, I know Python. I wasn't asking how to do it in Python, I was asking whether there is an existing command-line tool for this, since using existing tools is the Unix philosophy. If you know for sure that there is no such way, post is as an answer and I'll upvote and accept it. Until then, *you* need to learn to read the question.

Score:2

Ubuntu

sudodus

6/18/24, 7:15 PM

Try this command line to get what you want, '2325',

$ <<<'e28ca5' xxd -r -p | iconv -t unicode | hexdump
0000000 feff 2325                              
0000004

See also the output from the first part of the command line

<<<'e28ca5' xxd -r -p

(there is no line feed, so the prompt comes directly after its output),

sudodus@c30 ~ $ <<<'e28ca5' xxd -r -p
⌥sudodus@c30 ~ $

As you can see, I get that special character ⌥ in the gnome-terminal window.

+ 0

Score:1

Ubuntu

steeldriver

6/18/24, 11:07 PM

Using the Perl Encode module, you could

pack the character string back into a sequence of bytes¹
decode the byte sequence as a UTF-8 character
encode the result as UTF-16be
unpack it to get the hexadecimal code point

$ printf '%s' 'e28ca5' | perl -MEncode=encode,decode -nE '
    say unpack("H*", encode("UTF-16be", decode("UTF-8", pack("H*",$_))))
'
2325

this step only required because you've unpacked it with od - if you start with the character itself or byte sequence you just need the decode-encode:

 $ printf '⌥' | perl -MEncode=encode,decode -nE '
     say unpack("H*", encode("UTF-16be", decode("UTF-8", $_)))
 '
 2325

 $ printf '\xe2\x8c\xa5' | perl -MEncode=encode,decode -nE '
     say unpack("H*", encode("UTF-16be", decode("UTF-8", $_)))
 '
 2325

+ 0

Score:0

Ubuntu

raj

6/18/24, 11:45 PM

Try the following:

iconv -f utf8 -t ucs2 <inputfile | hexdump -v -e '/2 "%04x "'

For inputfile containing three bytes with values e2 8c a5 it outputs 2325.

+ 0

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: Is there a command-line method for converting UTF-8 values into Unicode values?

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.