Score:8

Compare new txt file with old txt file and remove all data that matches

au flag

I have a new file with the following data separated by a carriage return

a
a
b
c
d
d

I have an old file that is also separated by a carriage return

b
d

How do I remove b & d from the new file and remove one of the a's from the first file?

The desired output, separated by a carriage return, would be

a
c

I have tried sort -u which removes the b & d but also removes the a a. I have tried grep -vxFf however, there are duplicates from the new file.

cn flag
Look into the `uniq` command to remove duplicates
Robert Carroll avatar
au flag
I did that with the `sort -u (uniq)` and it removes the a and a. I would like one of the a's to stay. It'd be better if there was a merge command maybe?
cn flag
You say "remove one of the a's". Does that mean if there are 3 a's, the result should have 2 of them?
Score:11
za flag
 grep -F -f oldfile -v newfile | uniq

Use the oldfile as search for grep, in the end remove duplicate lines.

Robert Carroll avatar
au flag
Thank you!! that worked :) I really appreciate it.
vanadium avatar
cn flag
If this worked for your, then please show your appreciation by "accepting" it: click the checkmark next to the answer. This also shows other users of this site that a usefull answer is available here. Feel free also to upvote other answers with different approaches that also work.
dedunumax avatar
kr flag
`grep -F -f oldfile -v newfile | sort | uniq` will guarantee the unique results.
Robert Carroll avatar
au flag
Thank you @dedunu. I will add the sort and test it out. I appreciate all of the replies.
dedunumax avatar
kr flag
https://gist.github.com/dedunumax/38fc581d337df9b442f4bffce3960492 https://www.onlinegdb.com/LNJSGfsFB this might help!
Score:7
hr flag

Using awk, print only the lines of newfile that haven't previously occurred in either file:

awk '!(seen[$0]++ || NR==FNR)' oldfile newfile
Robert Carroll avatar
au flag
Thank you @steeldriver. Is this better than the grep I am using? I'll try it, but I am happy grep.
hr flag
@RobertCarroll it's not necessarily *better*, but it is subtly *different* from the [currently accepted answer](https://askubuntu.com/a/1474183/178692) in that it will remove duplicates wherever they occur, without sorting, whereas `grep ... | uniq` will only remove *adjacent* duplicates, while `grep ... | sort | uniq` will remove all duplicates but may result in re-ordered output. So it depends what you want.
Score:3
cn flag

If you can sort the files (which I assume as you said you tried sort -u), you can run comm oldfile.sorted newfile.sorted which will show the contents in three columns - old file only, new file only, both files. The -1, -2, -3 options allow you to suppress some of the columns, so comm -13 oldfile.sorted newfile.sorted | uniq should do what you want.

Score:1
it flag

Read man grep and do something like:

grep -F -f oldfile -v newfile
Robert Carroll avatar
au flag
That doesnt work either. I still have the issue of a and a both being on the updatedFile. `grep -vxFf oldfile newfile > updatedFile` produced the same result.
Score:1
jp flag

In perl:

perl -ne 'print if ! ( $seen{$_}++ || $#ARGV eq 0 )' oldfile newfile

Or:

perl -ne '( $seen{$_}++ || $#ARGV eq 0 ) || print' oldfile newfile
Score:0
lr flag
zzz

Use comm:

comm -1 -2 <(sort -u old) <(sort -u new)
Score:0
lr flag

Though the grep answer works for you, sort and uniq can do the trick also.

Assuming you have the construct <(command) available (e.g. use bash). The construct will be replaced by a temporary file, containing the output of command.

This will work, with your example:

sort <(uniq new) old | uniq -u

Though more general would be:

sort <(uniq new) old old | uniq -u

Which also works if the old file contains lines that are not in new.

What happens is, that first the new file gets it duplicate lines removed. (This assumes that duplicates in the new file are adjacent. Otherwise replace <(uniq new) by <(sort -u new).)

Then the output of sort ... is taken as input to uniq -u.

uniq -u prints lines that are occurring only once.

Because the old file is presented twice to the sort ... command, none of the lines in old will make it to the output of uniq -u.

Also any lines common between old and new will not be present.



sort and uniq can be combined in several ways to perform mathematical set operations:

  1. Set union: f U g

    sort f g | uniq

  2. Set intersection: f ∩ g

    sort f g | uniq -d ## uniq -d only prints duplicate lines

  3. Set difference: f \ g

    sort f g g | uniq -u ## uniq -u only prints unique lines

Note that the files f and g represent sets. So they must not have duplicate lines. If that is not the case, replace e.g. f by <(sort -u f).

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.