If your files are properly constructed tab separated (TSV) files, then you can use csvjoin
from the Python-based csvkit
package.
Ex. given:
$ head file1.tsv file2.tsv | cat -A
==> file1.tsv <==$
LogEntryTime^InameId^IPartnerId$
2021-06-05T15:00:53 07^I5lsddf^Iqyutxwr$
$
==> file2.tsv <==$
nameId^IGroupId^IcompnayId$
5lsddf^Il4buafm^I0rd33cs$
(cat -A
to make the tabs visible, as ^I
) then
$ csvjoin -I -t -c nameId file1.tsv file2.tsv
LogEntryTime,nameId,PartnerId,GroupId,compnayId
2021-06-05T15:00:53 07,5lsddf,qyutxwr,l4buafm,0rd33cs
To get the output back in TSV format, use csvformat
from the same package:
$ csvjoin -I -t -c nameId file1.tsv file2.tsv | csvformat -T
LogEntryTime nameId PartnerId GroupId compnayId
2021-06-05T15:00:53 07 5lsddf qyutxwr l4buafm 0rd33cs
Note that -I
disables type inference - which can sometimes behave unexpectedly, especially with datetime fields.
Even simpler, using Miller (available from the universe repository, as package miller
):
$ mlr --tsv join -f file1.tsv -j nameId then reorder -f LogEntryTime file2.tsv
LogEntryTime nameId PartnerId GroupId compnayId
2021-06-05T15:00:53 07 5lsddf qyutxwr l4buafm 0rd33cs
The reorder
is necessary because by default mlr join
outputs the common field first (just like the system join
command). Note that for unsorted input, the whole of file1.tsv
will be loaded into memory.