Score:0

How to keep only the word after the third underscore in 8th column?

cn flag

I have a table (.tsv) like the following:

s__Methanobrevibacter_smithii   k__Archaea  p__Euryarchaeota    c__Methanobacteria  o__Methanobacteriales   f__Methanobacteriaceae  g__Methanobrevibacter   s__Methanobrevibacter_smithii
s__Methanosphaera_stadtmanae    k__Archaea  p__Euryarchaeota    c__Methanobacteria  o__Methanobacteriales   f__Methanobacteriaceae  g__Methanosphaera   s__Methanosphaera_stadtmanae
s__Candidatus_Methanomassiliicoccus_intestinalis    k__Archaea  p__Euryarchaeota    c__Thermoplasmata   o__Methanomassiliicoccales  f__Methanomassiliicoccaceae g__Methanomassiliicoccus    s__Candidatus_Methanomassiliicoccus_intestinalis
s__Actinobaculum_sp_oral_taxon_183  k__Bacteria p__Actinobacteria   c__Actinobacteria   o__Actinomycetales  f__Actinomycetaceae g__Actinobaculum    s__Actinobaculum_sp_oral_taxon_183
s__Actinomyces_graevenitzii k__Bacteria p__Actinobacteria   c__Actinobacteria   o__Actinomycetales  f__Actinomycetaceae g__Actinomyces  s__Actinomyces_graevenitzii

I want to keep only the word after the third underscore and remove everything from that column. Besides, want to remove 4th underscore and everything after that in the first column keeping other columns as it is. I want to get an output like the following:

s__Methanobrevibacter_smithii   k__Archaea  p__Euryarchaeota    c__Methanobacteria  o__Methanobacteriales   f__Methanobacteriaceae  g__Methanobrevibacter   s__smithii
s__Methanosphaera_stadtmanae    k__Archaea  p__Euryarchaeota    c__Methanobacteria  o__Methanobacteriales   f__Methanobacteriaceae  g__Methanosphaera   s__stadtmanae
s__Candidatus_Methanomassiliicoccus k__Archaea  p__Euryarchaeota    c__Thermoplasmata   o__Methanomassiliicoccales  f__Methanomassiliicoccaceae g__Methanomassiliicoccus    s__intestinalis
s__Actinobaculum_sp k__Bacteria p__Actinobacteria   c__Actinobacteria   o__Actinomycetales  f__Actinomycetaceae g__Actinobaculum    s__sp
s__Actinomyces_graevenitzii k__Bacteria p__Actinobacteria   c__Actinobacteria   o__Actinomycetales  f__Actinomycetaceae g__Actinomyces  s__graevenitzii

Can anyone please help me doing that?

Many Thanks

sudodus avatar
jp flag
If the file is not too big, you can import it into a spreadsheet program, for example LibreOffice Calc, and manipulate the columns in its graphical interface.
deep771992 avatar
cn flag
The file is not too big. Do you have any tutorial on that?
sudodus avatar
jp flag
No, I have no tutorial, but if you have ever used a spreadsheet program, for example Excel in Windows, it is rather straight-forward. I think you can find tutorials via the internet. Try via your web search engine with the search string **import csv to LibreOffice Calc** or something similar until you find a helpful text.
hr flag
Your text says that you want to *"keep only the word after the third underscore and remove everything from"* column 8, however in the 1st line `s__Methanobrevibacter_smithii` becomes `s__smithii` while in the 3rd line column 8 goes from `s__Candidatus_Methanomassiliicoccus_intestinalis` to `s__intestinalis`. Depending on whether you count the empty string between the first two `__` as a word or not, these are either the 3rd and 4th or 4th and 5th words. Is it actually the *last* `_`-delimited word that you wish to retain?
deep771992 avatar
cn flag
Thanks Steeldriver for your response. Actually I put the question wrong. Let me clear the query in the simplest manner: I need "s__" + "the whole word after the third underscore". In that way, `s__Candidatus_Methanomassiliicoccus_intestinalis` will be `s__Methanomassiliicoccus_intestinalis`. Thanks
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.