Score:6

Ubuntu

How can I count each type of character (and total them) in a text file?

james simmons

3/26/23, 7:42 PM

I was just wondering if anyone could tell me how to count the occurrences of each different character in a text file and also a total of all the occurrences of everything added together at the end.

I'm just trying to learn the process for my own knowledge.

311

0 + 0

command-line

text-processing

waltinator

3/27/23, 12:14 AM

Or do you want "42 a, 33 b, 27 c, ..."? It's probably easy in `perl`.

Score:7

Ubuntu

sudodus

3/27/23, 5:09 AM

General count with `wc`

You can use wc to count lines, words, characters, bytes ... but not list the count for each separate character. See man wc.

Count number of each separate character

If you want to list the number for each separate character you can

start by printing each character to a separate line with grep
then sort them with sort
then use uniq to print the number of each kind

Examples

Examples assuming that you have also a link to a dictionary file (word-list) at /usr/share/dict/words

$ wc --lines --words --chars --bytes /usr/share/dict/words
102305 102305 971304 971578 /usr/share/dict/words

There are more bytes than characters because some characters consist of more than one byte (for example the last [umlaut] characters in the list below).

  $ < /usr/share/dict/words grep -o '.' |sort |uniq -c
  29105 '
  65630 a
   1438 A
     12 á
      6 â
  14654 b
   1481 B
  31144 c
   1636 C
      5 ç
  28422 d
    844 D
  90579 e
    653 E
    148 é
     29 è
      6 ê
  10380 f
    538 F
  22501 g
    852 G
  19325 h
    919 H
  68343 i
    361 I
      2 í
   1482 j
    560 J
   8188 k
    680 K
  41512 l
    942 L
  21488 m
   1768 M
  58328 n
    587 N
      8 ñ
  50187 o
    409 O
     10 ó
      2 ô
  21691 p
   1049 P
   1492 q
     72 Q
  58312 r
    782 R
  92909 s
   1656 S
  53309 t
    908 T
  26773 u
    140 U
      3 û
   7870 v
   7281 w
    352 V
    533 W
   2139 x
     44 X
  12896 y
    154 Y
     14 ü
   3266 z
    161 Z
      3 å
      2 Å
      7 ä
     17 ö

0 + 0

phuclv

3/27/23, 4:11 PM

you don't need redirection to work with grep because it can read the file directly. In fact by specifying the file it can do many optimizations that can't be done with a stream

Score:5

Ubuntu

elmclose

3/27/23, 8:06 AM

There is a very simple way of counting each character in a text file. I have used your own question as a text file (called countc) and tested this code:

grep '.' -o countc | awk '{a[$1]++} END {for (i in a) print i,a[i]}'

and this is what you get:

' 1
h 9
u 6
 46
v 1
i 7
j 2
w 5
k 1
x 1
l 10
y 4
m 3
n 16
a 14
. 2
o 19
p 1
c 12
I 2
d 9
r 14
e 28
f 8
s 8
g 5
t 21

awk arrays are very useful for such operations.

0 + 0

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: How can I count each type of character (and total them) in a text file?

TH: ฉันจะนับอักขระแต่ละประเภท (และรวมทั้งหมด) ในไฟล์ข้อความได้อย่างไร

RO: Cum pot număra fiecare tip de caracter (și să le însumez) într-un fișier text?

RU: Как я могу подсчитать каждый тип символов (и суммировать их) в текстовом файле?

VI: Làm cách nào tôi có thể đếm từng loại ký tự (và tổng số ký tự) trong tệp văn bản?