Score:0

Ubuntu

merging tab delimited txt files based on column (which is header ) in bash?

Vamshi Krishna CH

10/11/22, 5:41 AM

Im having two text files which contains million records , all the records are tab delimited , how we can merge these two files based on same header(column)?

file:1

    LogEntryTime              nameId       PartnerId        
    2021-06-05T15:00:53 07    5lsddf        qyutxwr

file:2

        nameId  GroupId  compnayId
        5lsddf  l4buafm   0rd33cs

output like this:

    LogEntryTime              nameId       PartnerId    GroupId  compnayId
    2021-06-05T15:00:53 07    5lsddf        qyutxwr     l4buafm   0rd33cs

Tried this but not working:

paste file1.txt file2.txt | nameId -s $'\t' -t

and

cat file1.txt file2.txt |  awk -F '\t' '{print $ list the all columns name here}'

awk one which is working but need to mention all the column numbers there.

is there any other solution help me out.

thanks in advance

330

3 + 7

command-line

bash

scripts

vanadium

10/11/22, 6:17 AM

I would probably use a database for that.

Vamshi Krishna CH

10/11/22, 6:41 AM

in shell script how we can achieve that?

vanadium

10/11/22, 6:42 AM

I don't think this will be easy. Many loops, and it will be slow.

Vamshi Krishna CH

10/11/22, 6:57 AM

for hundreds of records also use the same procedure?

vanadium

10/11/22, 7:02 AM

If the order of the records is identical in all text files (i.e. record 2 of file1 maches record 2 of file2 etc), then your awk command with paste will cut it. Better add that info to your question. I was assuming that data need to be matched, forexample, `nameid 5lsddf` is record 1 in file1, but record *x* in file 2.

Vamshi Krishna CH

10/11/22, 7:48 AM

same column will cut in the second file merge with first file. that's all

Vamshi Krishna CH

10/11/22, 7:48 AM

can you help on this?

Score:2

Ubuntu

steeldriver

10/11/22, 11:43 AM

If your files are properly constructed tab separated (TSV) files, then you can use csvjoin from the Python-based csvkit package.

Ex. given:

$ head file1.tsv file2.tsv | cat -A
==> file1.tsv <==$
LogEntryTime^InameId^IPartnerId$
2021-06-05T15:00:53 07^I5lsddf^Iqyutxwr$
$
==> file2.tsv <==$
nameId^IGroupId^IcompnayId$
5lsddf^Il4buafm^I0rd33cs$

(cat -A to make the tabs visible, as ^I) then

$ csvjoin -I -t -c nameId file1.tsv file2.tsv
LogEntryTime,nameId,PartnerId,GroupId,compnayId
2021-06-05T15:00:53 07,5lsddf,qyutxwr,l4buafm,0rd33cs

To get the output back in TSV format, use csvformat from the same package:

$ csvjoin -I -t -c nameId file1.tsv file2.tsv | csvformat -T
LogEntryTime    nameId  PartnerId       GroupId compnayId
2021-06-05T15:00:53 07  5lsddf  qyutxwr l4buafm 0rd33cs

Note that -I disables type inference - which can sometimes behave unexpectedly, especially with datetime fields.

Even simpler, using Miller (available from the universe repository, as package miller):

$ mlr --tsv join -f file1.tsv -j nameId then reorder -f LogEntryTime file2.tsv
LogEntryTime    nameId  PartnerId       GroupId compnayId
2021-06-05T15:00:53 07  5lsddf  qyutxwr l4buafm 0rd33cs

The reorder is necessary because by default mlr join outputs the common field first (just like the system join command). Note that for unsorted input, the whole of file1.tsv will be loaded into memory.

0 + 0

Score:2

Ubuntu

bac0n

10/11/22, 11:18 AM

Loop one of the files into an array and replace the first field of the second file (which is nameId) with the array index that correlates to the common field.

awk -F \\t+ -vOFS=\\t 'NR==FNR{a[$2]=$0;next} {$1=a[$1]}1' file{1,2}.txt

0 + 0

Score:1

Ubuntu

glenn jackman

10/11/22, 2:45 PM

With this particular set of data:

awk '
    BEGIN {FS = OFS = "\t"}
    NR == FNR {f1[$2] = $0; next}
    {$1 = f1[$1]; print}
' file{1,2}.txt

Only the join field ($2 in file1, $1 in file2) is mentioned.

Produces the tab-separated output

LogEntryTime    nameId  PartnerId   GroupId compnayId
2021-06-05T15:00:53 07  5lsddf  qyutxwr l4buafm 0rd33cs

For pretty output, pipe into | column -t -s $'\t' to get

LogEntryTime            nameId  PartnerId  GroupId  compnayId
2021-06-05T15:00:53 07  5lsddf  qyutxwr    l4buafm  0rd33cs

0 + 0

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: merging tab delimited txt files based on column (which is header ) in bash?

TH: การรวมไฟล์ txt ที่คั่นด้วยแท็บตามคอลัมน์ (ซึ่งเป็นส่วนหัว) เป็น bash?

RO: fuzionarea fișierelor txt delimitate de file pe baza coloanei (care este antetul) în bash?

RU: объединение текстовых файлов с разделителями табуляции на основе столбца (который является заголовком) в bash?

VI: hợp nhất các tệp txt được phân tách bằng tab dựa trên cột (là tiêu đề) trong bash?

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.