Score:0

Using sed or awk to remove near-duplicates

us flag

I currently use the following to get as close as I can do to a file

cut -d ' ' -f 3- /var/log/issues.log | sed -E 's/[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}//g' | sort -u

So far it gets rid of the timestamp at the start of each line and removes the IP address.

However I'm still left with dozens of line of the format(s)

Failed login from for A
Failed login from for B
Failed login from for C
Failed login from for D
Failed login from for E
Invalid heartbeat 'A' from 
Invalid heartbeat 'B' from 
Invalid heartbeat 'C' from 
Invalid heartbeat 'D' from
Invalid heartbeat 'E' from

How would I further amend my command to take these "near" duplicates away leaving only. A, B, C, D and E could be any string.

Failed login from for 
Invalid heartbeat from 

Thanks

Nate T avatar
it flag
What is the input data, and what is the output you are trying for. You might check [U&L]; if yours is a common use case, I'm guessing that someone has already asked there
Philippos avatar
cn flag
Why not add `/Failed login from for/d;/Invaild heartbeat.*from/d` to your `sed` command?
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.