Score:1

How to select specific columns in a file alternating between lines

af flag

I have a text file containing lines of protein sequences information and the related sequences.

>4YDY_1|Chains A, C[auth B]|DARPIN 44C12V5|synthetic construct (32630)
MRGSHHHHHHGSDLGKKLLEAARAGQDDEVRILMANGADVNALDDSGYTPLHLAAEDGHLEIVEVLLKHGADVNAADRLGDTPLHLAAFVGHLEIVEVLLKAGADVNAVDLAGVTPLHVAAFYGHLEIVEVLLKAGADVNAQDKFGKTPADIAADNGHEDIAEVLQKLN

For these chains there is a sequence. I want to run through every line of the file and keep only the ID and the first chain, remove the entity number, which is right after the ID (_1), put a comma between the ID and the chain e remove any other things in the line. This operation should be done one line yes and one no. Also there are some sequences (the letters in the second line) which have less than 50 letters. I want to remove every sequence containing less than 50 letters along with its ID, which is the line above it.

To be clear, this is the output I'm looking for for every sequence in the file:

>4YDY:A
MRGSHHHHHHGSDLGKKLLEAARAGQDDEVRILMANGADVNALDDSGYTPLHLAAEDGHLEIVEVLLKHGADVNAADRLGDTPLHLAAFVGHLEIVEVLLKAGADVNAVDLAGVTPLHVAAFYGHLEIVEVLLKAGADVNAQDKFGKTPADIAADNGHEDIAEVLQKLN

Thank you in advance.

cn flag
Ray
For what you want to do, you probably want to turn to Perl or Python. Perhaps someone else can offer you help with `bash`, but I think it would be fairly difficult.
Score:0
jp flag

This is tested and works with your example.

#!/bin/bash
# Read the file two lines at a time
while read -r one; do
   read -r two
   # If the second line is fifty or more characters long
   if ((${#two} >= 50)); then
     IFS='|' read -ra f <<< "$one"
     id="${f[0]}"
     # Remove the underscore "_" and everything after it from the ID
     id=${id%_*}
     # Grab the first chain
     chain="${f[1]}"
     chain=$(cut -d ' ' -f2 <<<"$chain" | cut -d ',' -f1)
     one="$id:$chain" 
     # Print the two lines in the desired format
     printf '%s\n' "$one" "$two"
   fi
done < file.txt
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.