Score:0

Need help to figure out a regex to modify some files

in flag

I have some files that I need to clean up some names.

for example:

GCA_940670685.1_Clostridium_sp_chr  3757330
GCA_940677205.1_Clostridium_colinum_chr 2035557
GCA_942548115.1_Aeromicrobium_sp_chr    3463989
GCA_943169635.1_Fenollaria_sp_chr   3260126
GCA_943169825.1_Varibaculum_sp_chr  4423380
GCA_943736995.1_Sporosarcina_sp_chr 3771420

And I need some like this:

GCA_940670685.1 3757330
GCA_940677205.1 2035557
GCA_942548115.1 3463989
GCA_943169635.1 3260126
GCA_943169825.1 4423380
GCA_943736995.1 3771420

I tried to use:

sed 's/_[A-Za-z]+_//gI' Terrabacteria_chr_lengths.tsv

sed 's/\w+_\w+_chr//gI' Terrabacteria_chr_lengths.tsv

find Results/Lengths/Bacteria -type f -exec sed -i 's/_\w+_\w+_chr//g' {} \;

But seems nothing is working, I think due my poor skills in the regex, ex. \w+\w+_chr.

Any suggestion would be appreciate. Thank you.

Paulo

Score:1
hr flag

The biggest issue is that + does not act as a quantifier in a sed basic regular expression (BRE) - you need to switch to extended regular expression (ERE) mode using -E or -r to use that (or change + to \{1,\} for a POSIX BRE version1).

Beyond that, you seem to want to match a sequence of alphabetic characters and underscores after the initial underscore (but not ending with an underscore). So either:

sed -E 's/_[A-Za-z_]+//'

or

sed 's/_[A-Za-z_]\{1,\}//'

You don't need the g modifier, since you're making a single substitution per line.


1 GNU sed actually supports \+ as a quantifier in BRE, but IMHO that just adds to the confusion.

Paulo Sergio Schlogl avatar
in flag
Thank you very much.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.