Score:0

Word count for multiple .txt files in linux

mx flag

I need to find the words in multiple .txt files using a linux cli. Currently I am using the following command:

cat *.txt|wc -w

I have made a test directory to practice the command and it seems to work for each individual .txt file but it fails to do it properly for all the .txt files. I have a directory with 5 files in which 4 of them contain each 5 words and 1 is emtpy. For the individual cat textfile.txt|wc -w it gives the right answer. But for the count it gives 17 when it should be (4 times 5 + 0 times 0 =) 20 Can someone tell my why the count given is 17 while the real count is 20?

pLumo avatar
in flag
cannot reproduce, you will need to add your input files.
pLumo avatar
in flag
These links have nothing to do with the question.
Score:3
tr flag

You can run

wc -w *.txt

This will give you the word count for each file and a total sum in the last row.

As it turned out, OPs issue was a missing newline in one of the files. This caused cat *txt to combine multiple words into one and therefore resulting in a wrong count. The command above is more robust in this situation as it processes each file individually.

mx flag
I found out why, there was no seperator so the last character from the previous file got attached to the first character of the following file.
mx flag
Thanks @wayne_yux I have been struggling with it the whole morning with a deadline coming up. Such a stupid small thing your solution works, HERO:)
Maarten Meijer avatar
fr flag
If you put a newline or space at the end of all the files (this happens automatically with `echo` for example), you could just use the command in the question
Wayne_Yux avatar
tr flag
@pLumo doing a `cat` first will give you only the total number of words. If you run `wc -w` on all files, you will get a number per file. That makes debugging way easier. As it seems, OPs issue was that there was no newline at the end of one file. That would cause `cat` to combine some words. If you handle the files individually, this does not occur
pLumo avatar
in flag
true true :-) makes sense.
Score:1
hr flag

The most likely explanation is that the final lines of your files are not properly newline-terminated, so that when you cat them, the first word of the next file gets appended to last word of the previous file:

Ex. given

steeldriver@pc:~$ printf 'foo\nbar\nbaz\nbam\nboo' | tee {1..4}.txt
foo
bar
baz
bam
boosteeldriver@pc:~$ printf '' > 5.txt

then

steeldriver@pc:~$ wc -w {1..5}.txt
 5 1.txt
 5 2.txt
 5 3.txt
 5 4.txt
 0 5.txt
20 total

but

steeldriver@pc:~$ cat {1..5}.txt | wc -w
17
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.