Score:6

How can I count each type of character (and total them) in a text file?

gs flag

I was just wondering if anyone could tell me how to count the occurrences of each different character in a text file and also a total of all the occurrences of everything added together at the end.

I'm just trying to learn the process for my own knowledge.

waltinator avatar
it flag
Or do you want "42 a, 33 b, 27 c, ..."? It's probably easy in `perl`.
Score:7
jp flag

General count with wc

You can use wc to count lines, words, characters, bytes ... but not list the count for each separate character. See man wc.

Count number of each separate character

If you want to list the number for each separate character you can

  • start by printing each character to a separate line with grep
  • then sort them with sort
  • then use uniq to print the number of each kind

Examples

Examples assuming that you have also a link to a dictionary file (word-list) at /usr/share/dict/words

$ wc --lines --words --chars --bytes /usr/share/dict/words
102305 102305 971304 971578 /usr/share/dict/words

There are more bytes than characters because some characters consist of more than one byte (for example the last [umlaut] characters in the list below).

  $ < /usr/share/dict/words grep -o '.' |sort |uniq -c
  29105 '
  65630 a
   1438 A
     12 á
      6 â
  14654 b
   1481 B
  31144 c
   1636 C
      5 ç
  28422 d
    844 D
  90579 e
    653 E
    148 é
     29 è
      6 ê
  10380 f
    538 F
  22501 g
    852 G
  19325 h
    919 H
  68343 i
    361 I
      2 í
   1482 j
    560 J
   8188 k
    680 K
  41512 l
    942 L
  21488 m
   1768 M
  58328 n
    587 N
      8 ñ
  50187 o
    409 O
     10 ó
      2 ô
  21691 p
   1049 P
   1492 q
     72 Q
  58312 r
    782 R
  92909 s
   1656 S
  53309 t
    908 T
  26773 u
    140 U
      3 û
   7870 v
   7281 w
    352 V
    533 W
   2139 x
     44 X
  12896 y
    154 Y
     14 ü
   3266 z
    161 Z
      3 å
      2 Å
      7 ä
     17 ö
phuclv avatar
sd flag
you don't need redirection to work with grep because it can read the file directly. In fact by specifying the file it can do many optimizations that can't be done with a stream
Score:5
cn flag

There is a very simple way of counting each character in a text file. I have used your own question as a text file (called countc) and tested this code:

grep '.' -o countc | awk '{a[$1]++} END {for (i in a) print i,a[i]}'

and this is what you get:

' 1
h 9
u 6
 46
v 1
i 7
j 2
w 5
k 1
x 1
l 10
y 4
m 3
n 16
a 14
. 2
o 19
p 1
c 12
I 2
d 9
r 14
e 28
f 8
s 8
g 5
t 21

awk arrays are very useful for such operations.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.