Score:0

merging tab delimited txt files based on column (which is header ) in bash?

us flag

Im having two text files which contains million records , all the records are tab delimited , how we can merge these two files based on same header(column)?

file:1

    LogEntryTime              nameId       PartnerId        
    2021-06-05T15:00:53 07    5lsddf        qyutxwr 
        
        

file:2

        nameId  GroupId  compnayId
        5lsddf  l4buafm   0rd33cs               
    

output like this:

    LogEntryTime              nameId       PartnerId    GroupId  compnayId
    2021-06-05T15:00:53 07    5lsddf        qyutxwr     l4buafm   0rd33cs

Tried this but not working:

paste file1.txt file2.txt | nameId -s $'\t' -t

and

cat file1.txt file2.txt |  awk -F '\t' '{print $ list the all columns name here}'

awk one which is working but need to mention all the column numbers there.

is there any other solution help me out.

thanks in advance

vanadium avatar
cn flag
I would probably use a database for that.
Vamshi Krishna CH avatar
us flag
in shell script how we can achieve that?
vanadium avatar
cn flag
I don't think this will be easy. Many loops, and it will be slow.
Vamshi Krishna CH avatar
us flag
for hundreds of records also use the same procedure?
vanadium avatar
cn flag
If the order of the records is identical in all text files (i.e. record 2 of file1 maches record 2 of file2 etc), then your awk command with paste will cut it. Better add that info to your question. I was assuming that data need to be matched, forexample, `nameid 5lsddf` is record 1 in file1, but record *x* in file 2.
Vamshi Krishna CH avatar
us flag
same column will cut in the second file merge with first file. that's all
Vamshi Krishna CH avatar
us flag
can you help on this?
Score:2
hr flag

If your files are properly constructed tab separated (TSV) files, then you can use csvjoin from the Python-based csvkit package.

Ex. given:

$ head file1.tsv file2.tsv | cat -A
==> file1.tsv <==$
LogEntryTime^InameId^IPartnerId$
2021-06-05T15:00:53 07^I5lsddf^Iqyutxwr$
$
==> file2.tsv <==$
nameId^IGroupId^IcompnayId$
5lsddf^Il4buafm^I0rd33cs$

(cat -A to make the tabs visible, as ^I) then

$ csvjoin -I -t -c nameId file1.tsv file2.tsv
LogEntryTime,nameId,PartnerId,GroupId,compnayId
2021-06-05T15:00:53 07,5lsddf,qyutxwr,l4buafm,0rd33cs

To get the output back in TSV format, use csvformat from the same package:

$ csvjoin -I -t -c nameId file1.tsv file2.tsv | csvformat -T
LogEntryTime    nameId  PartnerId       GroupId compnayId
2021-06-05T15:00:53 07  5lsddf  qyutxwr l4buafm 0rd33cs

Note that -I disables type inference - which can sometimes behave unexpectedly, especially with datetime fields.


Even simpler, using Miller (available from the universe repository, as package miller):

$ mlr --tsv join -f file1.tsv -j nameId then reorder -f LogEntryTime file2.tsv
LogEntryTime    nameId  PartnerId       GroupId compnayId
2021-06-05T15:00:53 07  5lsddf  qyutxwr l4buafm 0rd33cs

The reorder is necessary because by default mlr join outputs the common field first (just like the system join command). Note that for unsorted input, the whole of file1.tsv will be loaded into memory.

Score:2
cn flag

Loop one of the files into an array and replace the first field of the second file (which is nameId) with the array index that correlates to the common field.

awk -F \\t+ -vOFS=\\t 'NR==FNR{a[$2]=$0;next} {$1=a[$1]}1' file{1,2}.txt
Score:1
cn flag

With this particular set of data:

awk '
    BEGIN {FS = OFS = "\t"}
    NR == FNR {f1[$2] = $0; next}
    {$1 = f1[$1]; print}
' file{1,2}.txt

Only the join field ($2 in file1, $1 in file2) is mentioned.

Produces the tab-separated output

LogEntryTime    nameId  PartnerId   GroupId compnayId
2021-06-05T15:00:53 07  5lsddf  qyutxwr l4buafm 0rd33cs

For pretty output, pipe into | column -t -s $'\t' to get

LogEntryTime            nameId  PartnerId  GroupId  compnayId
2021-06-05T15:00:53 07  5lsddf  qyutxwr    l4buafm  0rd33cs
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.