Score:1

Read from a text file and get the corresponding row from a csv file

us flag

I have a text file named train_ids.txt, and a csv file named dataset.csv.

The text file contains ids by this way:

dish_1.png
dish_5.png

The input csv file has a lot of colums and rows, but the first col contains ids. The first col is as follows:

dish_1 
dish_2 
dish_3 
dish_4 
dish_5 

I want to write a bash script to read the ids from the text file, and get the corresponding row which has these ids, and then put the whole row into an new csv file to be as output.

So the output csv file should be like this:

dish_1  | whatever_1
dish_5  | whatever_5

Notes:

  • In the output csv file, I mean by whatever the entire row
  • We have to remove .png from the ids text file to be able to search
  • The whole text file includes ids only as mentioned, there's no different lines
  • The ids text file is sorted but the csv file isn't sorted
  • ids in the text file are in the csv file for sure, so we don't need to skip or report any thing.

I don't know how to do so, could you help me?

hr flag
... similar to [Find match in csv file](https://askubuntu.com/a/1213425/178692)
terdon avatar
cn flag
Please [edit] your question and give us more detail. What is `whatever`? The entire row o of the csv file? Do we need to remove the `.png` from the "id"s? Can there be other extensions? Can there be multiple `.` in a name (e.g. `foo.png.bar`)? Will the files be sorted so that line N in one file corresponds to line N in the other? Should lines with no matching entry be skipped or reported?
Abanoub Asaad avatar
us flag
@terdon Just updated the question with additional notes.
terdon avatar
cn flag
Thanks, but please give us an example we can use to test our solutions. You have given us an IDs file with only two lines, and then a csv file with just one field, so we cannot produce your desired output based on your input. Also, you are showing a _unsorted_ ids file and a _sorted_ "csv" file but then say it is actually the opposite. Finally, what defines a field in your csv? Is it commas? Spaces? Tabs? Are the fields quoted? All of these are important to give you something you can actually use.
Score:1
cn flag

You don't need a script for this, just use paste. You haven't shown us what your real data are like so I am guessing you have something like this:

$ cat train_ids.txt 
dish_1.png
dish_2.png
dish_3.png
dish_4.png
dish_5.png

and

$ cat dataset.csv
dish_2, whatever2
dish_5, whatever5
dish_4, whatever4
dish_3, whatever3
dish_1, whatever1

If so, you can get the output you want with:

$ join -t, <(sed 's/\.png$//' train_ids.txt) <(sort dataset.csv)
dish_1, whatever1
dish_2, whatever2
dish_3, whatever3
dish_4, whatever4
dish_5, whatever5

And to get this as a pipe-separated file instead of a csv (comma separated file), you can do:

$ join -t, <(sed 's/\.png$//' train_ids.txt) <(sort dataset.csv) | sed 's/, / | /'
dish_1 | whatever1
dish_2 | whatever2
dish_3 | whatever3
dish_4 | whatever4
dish_5 | whatever5
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.