Score:1

Is there a command-line method for converting UTF-8 values into Unicode values?

ck flag

$ od -t x1 <inputfile produces … 0a e2 8c a5 0a ….

The 0a are linefeeds, and e2 8c a5 are the UTF-8 representation of a Unicode character.

For this simple case, I can do it by hand:

[1110]0010  [10]001100  [10]100101  =  10  0011  0010  0101 = 2 3 2 5

What shell command line can convert e2 8c a5 or e28ca5 into 2325?

(For completeness, converting the other way would be good to know too.)

Hannu avatar
ca flag
https://stackoverflow.com/a/147756
FedKad avatar
cn flag
See https://manpages.ubuntu.com/manpages/en/man1/iconv.1.html
Ray Butterworth avatar
ck flag
@Hannu, that question is about Python, not about the command-line shell.
Ray Butterworth avatar
ck flag
@FedKad, `echo "e28ca5" | iconv -f UTF-8 -t UNICODE` produces `��e28ca5`, not `2325`. (Actually, it produces `ff fe 65 00 32 00 38 00 63 …`, which looks like UTF-16 for the individual characters.)
sudodus avatar
jp flag
Try `<<<'e28ca5' xxd -r -p | iconv -t unicode | hexdump`; See also the output of `<<<'e28ca5' xxd -r -p`; I get that special character in the gnome-terminal window).
Hannu avatar
ca flag
@RayButterworth and the Python code does exactly what you're asking about, I'd say; learn Python - your will be up running faster than you can believe... Further down, had you read it, `iconv` is mentioned.
Ray Butterworth avatar
ck flag
@Hannu, I know Python. I wasn't asking how to do it in Python, I was asking whether there is an existing command-line tool for this, since using existing tools is the Unix philosophy. If you know for sure that there is no such way, post is as an answer and I'll upvote and accept it. Until then, *you* need to learn to read the question.
Score:2
jp flag

Try this command line to get what you want, '2325',

$ <<<'e28ca5' xxd -r -p | iconv -t unicode | hexdump
0000000 feff 2325                              
0000004

See also the output from the first part of the command line

<<<'e28ca5' xxd -r -p

(there is no line feed, so the prompt comes directly after its output),

sudodus@c30 ~ $ <<<'e28ca5' xxd -r -p
⌥sudodus@c30 ~ $

As you can see, I get that special character in the gnome-terminal window.

Score:1
hr flag

Using the Perl Encode module, you could

  1. pack the character string back into a sequence of bytes1
  2. decode the byte sequence as a UTF-8 character
  3. encode the result as UTF-16be
  4. unpack it to get the hexadecimal code point

So

$ printf '%s' 'e28ca5' | perl -MEncode=encode,decode -nE '
    say unpack("H*", encode("UTF-16be", decode("UTF-8", pack("H*",$_))))
'
2325

  1. this step only required because you've unpacked it with od - if you start with the character itself or byte sequence you just need the decode-encode:

     $ printf '⌥' | perl -MEncode=encode,decode -nE '
         say unpack("H*", encode("UTF-16be", decode("UTF-8", $_)))
     '
     2325
    

    or

     $ printf '\xe2\x8c\xa5' | perl -MEncode=encode,decode -nE '
         say unpack("H*", encode("UTF-16be", decode("UTF-8", $_)))
     '
     2325
    
Score:0
cn flag
raj

Try the following:

iconv -f utf8 -t ucs2 <inputfile | hexdump -v -e '/2 "%04x "'

For inputfile containing three bytes with values e2 8c a5 it outputs 2325.

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.