There are many ways to accomplish this with any number of programming languages but, if you're looking for something that will work on just about any Linux-based machine without requiring additional libraries, you can do something like this:
cat {filename} | tr -s ' ' '\n' | sort | uniq -c | sort -r | awk '{ print $2, $1 }'
Note: Be sure to replace only {filename}
with the name of the file containing the data you wish to parse.
Using your sample data set copied into a file several times gives the following result:
G 77
T 33
A 33
C 22
CT 11
How this works
This is actually the combination of multiple common Linux commands. This is how they work
Command |
What it does |
cat {filename} |
Read a given file |
tr -s ' ' '\n' |
Translate (or Transliterate, depending on who you ask) the spaces to newlines, which puts all strings on a distinct line. |
sort |
Sort the strings |
uniq -c |
Count unique strings |
sort -r |
Reverse the sort result |
awk '{ print $2, $1 }' |
Scan the sorted data and output the data in the format supplied |
Note that if you are using this on files that are several gigabytes in size, you will need to have a machine that has a decent amount of memory or is sufficiently configured correctly to manage its memory correctly.