Score:-4

count frequency values of all csv

de flag

I have a large csv file (without header or index), for example

A T C G
G T A C
CT T A G
G G G G

I want to count all values in all csv (not for special column or row), the output will be:

A 3
T 3
C 2
G 7
CT 1

How can I do this with Linux?

24601 avatar
in flag
homework question? what have you tried and what was the result?
in flag
Given the lack of constraints on the solution, perhaps this question would be better suited to [Code Golf](https://codegolf.stackexchange.com) where people can answer the question in 18+ obscure programming languages?
ARWA ABDULKADER BASHANFAR avatar
de flag
@24601 no it's not homework but this steps I really need it in my project .. I search and try some code but it's give me the result that I want .. Often they need to count for specific column not like me
ARWA ABDULKADER BASHANFAR avatar
de flag
@matigo yesterday I try to do this with python but it's take long time and in the end the kernal was died .. And I have only python and linux thus I want it in linux if it possible because I can't download other programming languages :(
vn flag
You could match any whole word using the PCRE pattern `\w+`: `grep -oP "\w+" my.csv | sort | uniq -c | sort -rn`. Check also ripgrep
Score:2
in flag

There are many ways to accomplish this with any number of programming languages but, if you're looking for something that will work on just about any Linux-based machine without requiring additional libraries, you can do something like this:

cat {filename} | tr -s ' ' '\n' | sort | uniq -c | sort -r | awk '{ print $2, $1 }'

Note: Be sure to replace only {filename} with the name of the file containing the data you wish to parse.

Using your sample data set copied into a file several times gives the following result:

G 77
T 33
A 33
C 22
CT 11

How this works

This is actually the combination of multiple common Linux commands. This is how they work

Command What it does
cat {filename} Read a given file
tr -s ' ' '\n' Translate (or Transliterate, depending on who you ask) the spaces to newlines, which puts all strings on a distinct line.
sort Sort the strings
uniq -c Count unique strings
sort -r Reverse the sort result
awk '{ print $2, $1 }' Scan the sorted data and output the data in the format supplied

Note that if you are using this on files that are several gigabytes in size, you will need to have a machine that has a decent amount of memory or is sufficiently configured correctly to manage its memory correctly.

ARWA ABDULKADER BASHANFAR avatar
de flag
Thank you but whats if my value contain space ? for example one value is A C and this code as I understand separate by space thus this value will be 2 value 1 in A and 1 in C , is there any solution for this case?
in flag
Based on the type of data that is in the example, I'm going to assume that you are a student in some sort of medical or chemical science field. As such, allow me to share with you a *very important lesson*: state requirements up front. When people help a person only to hear "Oh, I have all these other things that need to be considered that I never told you", it becomes *very* easy for people to ignore all future requests. I work in education and have a lot of colleagues who have a lot of difficulty getting tech support because they never communicate everything up front.
ARWA ABDULKADER BASHANFAR avatar
de flag
Thank you very much on your helping and also for advising. Yesterday I wrote my real data here https://stackoverflow.com/questions/74992499/count-specific-value-over-all-dataframe-in-python/74992682?noredirect=1# and wait run more than 10 hours so I convert to linux and I thank any data that I write is ok but when I try the suggested code not work for my data. Thus, can you help me with my real data please .. it's the same data but I convert dataframe to csv and without header and index.. THANKS VERY MUCH AGAIN.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.