Score:-3

How to sort the rows of a file as a matrix?

us flag

I would like to know how I can sort the rows of a file in the following way:

My file is file.txt (tab delimited):

g1 00A98_01563 00554_01552 CCUG38_01373 
g2 00554_01444
g3 00A98_04566 CCUG38_05322

I want to get this (tab delimited):

g 00A98 00554 CCUG38
g1 1 1 1
g2 0 1 0
g3 1 0 1

And/or also in this format (tab delimited):

g 00A98 00554 CCUG38
g1 00A98_01563 00554_01552 CCUG38_01373 
g2             00554_01444 
g3 00A98_04566             CCUG38_05322

How can I do it with the command line with sort, awk, grep or another?

All the best, Regards

24601 avatar
in flag
read [ask] and [edit] your question with informtion on what you have tried and how this relates to ubuntu. Looks surprising like a homework question
cn flag
what you want is a feature of spreadsheets. So https://www.google.com/sheets/about/ or libreoffice would be the tool to use.
The_Bioinformatic_BATMAN avatar
us flag
Listed, edited!
Score:4
hr flag

Using Miller, treat the input as delimit key-value pairs with TAB as the input field separator and underscore as the input pair separator, and set the output to TSV. Then you can unsparsify your data

$ mlr --idkvp --ifs tab --ips '_' --otsv unsparsify file.txt
1       00A98   00554   CCUG38
g1      01563   01552   01373
g2              01444
g3      04566           05322

You can then add various transformations ex.

$ mlr --idkvp --ifs tab --ips '_' --otsv unsparsify --fill-with 0 then put '
    for(k,v in mapexcept($*,"1")){if(v != 0){$[k] = 1}}
  ' then rename "1","g" file.txt
g       00A98   00554   CCUG38
g1      1       1       1
g2      0       1       0
g3      1       0       1

or

$ mlr --idkvp --ifs tab --ips '_' --otsv unsparsify then put -S '
    for(k,v in mapexcept($*,"1")){if(v != ""){$[k] = k ."_". v}}
  ' then rename "1","g" file.txt
g       00A98   00554   CCUG38
g1      00A98_01563     00554_01552     CCUG38_01373
g2              00554_01444
g3      00A98_04566             CCUG38_05322

The alignment looks "off" in the last case, but outputting with --ocsv in place of --tsv should confirm it is correct.

The_Bioinformatic_BATMAN avatar
us flag
thanks a lot bro! you saved me a headache!
Score:2
cn flag

This is toMatrix.awk

#!/usr/bin/env gawk -f
BEGIN { FS = OFS = "\t" }

{
    for (i=2; i<=NF; i++) {
        x=$i
        sub(/_.*/, "", x)
        if (!(x in values)) {
            values[x] = 1
            ordered[++value] = x
        }
        g[NR] = $1
        data[NR][x]=1
    }
}

END {
    printf "%s", "g"
    for (i = 1; i <= value; i++)
        printf "%s%s", OFS, ordered[i]
    print ""

    for (nr = 1; nr <= NR; nr++) {
        printf "%s", g[nr]
        for (i = 1; i <= value; i++)
            printf "%s%s", OFS, 0 + data[nr][ordered[i]]
        print ""
    }
}
$ gawk -f toMatrix.awk file.txt
g   00A98   00554   CCUG38
g1  1   1   1
g2  0   1   0
g3  1   0   1
The_Bioinformatic_BATMAN avatar
us flag
Thanks a lot bro! you saved me a headache!
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.