Score:-4

Ubuntu

count frequency values of all csv

ARWA ABDULKADER BASHANFAR

1/4/24, 6:04 AM

I have a large csv file (without header or index), for example

A T C G
G T A C
CT T A G
G G G G

I want to count all values in all csv (not for special column or row), the output will be:

A 3
T 3
C 2
G 7
CT 1

How can I do this with Linux?

212

1 + 5

command-line

csv

text-processing

24601

1/4/24, 6:28 AM

homework question? what have you tried and what was the result?

matigo

1/4/24, 6:30 AM

Given the lack of constraints on the solution, perhaps this question would be better suited to [Code Golf](https://codegolf.stackexchange.com) where people can answer the question in 18+ obscure programming languages?

ARWA ABDULKADER BASHANFAR

1/4/24, 6:34 AM

@24601 no it's not homework but this steps I really need it in my project .. I search and try some code but it's give me the result that I want .. Often they need to count for specific column not like me

ARWA ABDULKADER BASHANFAR

1/4/24, 6:36 AM

@matigo yesterday I try to do this with python but it's take long time and in the end the kernal was died .. And I have only python and linux thus I want it in linux if it possible because I can't download other programming languages :(

Pablo Bianchi

1/4/24, 6:19 PM

You could match any whole word using the PCRE pattern `\w+`: `grep -oP "\w+" my.csv | sort | uniq -c | sort -rn`. Check also ripgrep

Score:2

Ubuntu

matigo

1/4/24, 6:54 AM

There are many ways to accomplish this with any number of programming languages but, if you're looking for something that will work on just about any Linux-based machine without requiring additional libraries, you can do something like this:

cat {filename} | tr -s ' ' '\n' | sort | uniq -c | sort -r | awk '{ print $2, $1 }'

^{Note: Be sure to replace only {filename} with the name of the file containing the data you wish to parse.}

Using your sample data set copied into a file several times gives the following result:

G 77
T 33
A 33
C 22
CT 11

How this works

This is actually the combination of multiple common Linux commands. This is how they work

Command	What it does
`cat {filename}`	Read a given file
`tr -s ' ' '\n'`	Translate (or Transliterate, depending on who you ask) the spaces to newlines, which puts all strings on a distinct line.
`sort`	Sort the strings
`uniq -c`	Count unique strings
`sort -r`	Reverse the sort result
`awk '{ print $2, $1 }'`	Scan the sorted data and output the data in the format supplied

Note that if you are using this on files that are several gigabytes in size, you will need to have a machine that has a decent amount of memory or is sufficiently configured correctly to manage its memory correctly.

+ 3

ARWA ABDULKADER BASHANFAR

1/4/24, 7:06 AM

Thank you but whats if my value contain space ? for example one value is A C and this code as I understand separate by space thus this value will be 2 value 1 in A and 1 in C , is there any solution for this case?

matigo

1/4/24, 7:14 AM

Based on the type of data that is in the example, I'm going to assume that you are a student in some sort of medical or chemical science field. As such, allow me to share with you a *very important lesson*: state requirements up front. When people help a person only to hear "Oh, I have all these other things that need to be considered that I never told you", it becomes *very* easy for people to ignore all future requests. I work in education and have a lot of colleagues who have a lot of difficulty getting tech support because they never communicate everything up front.

ARWA ABDULKADER BASHANFAR

1/4/24, 7:51 AM

Thank you very much on your helping and also for advising. Yesterday I wrote my real data here https://stackoverflow.com/questions/74992499/count-specific-value-over-all-dataframe-in-python/74992682?noredirect=1# and wait run more than 10 hours so I convert to linux and I thank any data that I write is ok but when I try the suggested code not work for my data. Thus, can you help me with my real data please .. it's the same data but I convert dataframe to csv and without header and index.. THANKS VERY MUCH AGAIN.

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: count frequency values of all csv

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.