Score:1

Extracting specific data from different tables and merging it

in flag

I would like to extract specific rows from specific columns from different tables, based on the column header. However, my knowledge seems to be too limited to solve it myself (both in command-line and python).

I have a directory with a growing number of .tsv files (OTU tables). These files all have the same layout:

  • Row 1 contains headers.
  • Rows 2 - x contain data. The number of columns can differ between files.
  • Column 1 always contains the OTU ids.
  • The last column always contains the Taxonomy.

text-formatted files can be found at the bottom

For example, file 1 looks like this:

OTU_id 20210801_sampleA 20210801_sampleB 20210801_sampleC Taxonomy
otuA 5 1 0 Hans is cool
otuB 2 0 0 Patricia has blue eyes
otuF 0 6 2 Rutger is actually Bart
otuZ 1 3 0 Felix was lost

File 2 looks like this:

OTU_id 20211111_sampleT 20211111_sampleG 20211111_sampleA Taxonomy
otuA 0 3 3 Hans is cool
otuB 2 1 0 Patricia has blue eyes
otuP 1 6 0 Thomas is great
otuT 2 22 56 Anna and Elsa
otuZ 1 8 4 Felix was lost

Of course, the number of samples and otus are different per file. As you can also see, the headers of the samples are always preceded by a date. In a certain file, this is always the same date.

What I want to achieve now is the following:

If I have a certain sample name, for example sampleA, I want to extract the results for this sample from all tables (.tsv files) in the directory. Of course sampleA should only be extracted from the files in which it is present.

In the first table it will find sampleA in the second column and it should take over the results for which the value in the below rows is larger than 0. In the second table, it will find sampleA in the 4th column.

The final table should look like this:

OTU_id 20210801_sampleA 20211111_sampleA Taxonomy
otuA 5 3 Hans is cool
otuB 2 0 Patricia has blue eyes
otuZ 1 4 Felix was lost
otuT 0 56 Anna and Elsa

SampleA has data for otuA,B,Z in the first file. These results are taken over (so the OTU_id, the values for header 20210801_sampleA and the Taxonomy).

As you will see, otuF was not extracted from the first table, because sampleA had a value 0 there. otuF also isn't present in table 2, so hence the total absence in the output file.

otuA and otuZ also had values > 0 in table 2, so these values were taken over.

otuB had a value of 0 for sampleA in table 2. Normally it would not extract it, but since it had been extracted from table 1, a 0 is added in that position.

otuT is only present in table 2 and thus was extracted there (OTU_id, value for 20211111_sampleA and Taxonomy). Because it was not found in table 1, a 0 is added in that position.

In the results file, the first column should thus also be the OTU_id. columns 2 to x should contain the data for sampleA, extracted from the files in the directory. The last column should contain the Taxonomy.

I hope you manage to understand everything. It has been bothering me for some time and lack the skills to figure it out. Please let me know if something isn't clear!

Thanks a lot for your help and sorry to bother you with this!

Cheers Guillaume

Text-formatted table 1.tsv:

OTU_id  20210801_sampleA    20210801_sampleB    20210801_sampleC    Taxonomy
otuA    5   1   0   Hans is cool
otuB    2   0   0   Patricia has blue eyes
otuF    0   6   2   Rutger is actually Bart
otuZ    1   3   0   Felix was lost

text-formatted table2.tsv

OTU_id  20211111_sampleT    20211111_sampleG    20211111_sampleA    Taxonomy
otuA    0   3   3   Hans is cool
otuB    2   1   0   Patricia has blue eyes
otuP    1   6   0   Thomas is great
otuT    2   22  56  Anna and Elsa
otuZ    1   8   4   Felix was lost

text-formatted output.tsv

OTU_id  20210801_sampleA    20211111_sampleA    Taxonomy
otuA    5   3   Hans is cool
otuB    2   0   Patricia has blue eyes
otuZ    1   4   Felix was lost
otuT    0   56  Anna and Elsa
muru avatar
us flag
Please post example input and expected output as code-formatted text, please: https://askubuntu.com/editing-help#code That makes it easier to test
in flag
Hi muru, I added these at the end of the post now. I hope this is how you meant it? Please let me know if I did it wrong.
muru avatar
us flag
Thanks, that's much better!
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.