I have a log file as below:
12-02-2022 15:18:22 +0330 SOCK5.6699 00000 user144 97.251.107.125:38605 1.1.1.1:443 51766 169369 0 CONNECT 1.1.1.1:443
12-02-2022 15:18:27 +0330 SOCK5.6699 00094 user156 32.99.193.2:51242 1.1.1.1:443 715 388 0 CONNECT 1.1.1.1:443
12-02-2022 15:18:56 +0330 SOCK5.6699 00000 user105 191.184.66.98:40048 1.1.1.1:443 18105 29029 0 CONNECT 1.1.1.1:443
12-02-2022 15:18:56 +0330 SOCK5.6699 00000 user105 191.184.66.98:40070 1.1.1.1:443 674 26805 0 CONNECT 1.1.1.1:443
12-02-2022 15:20:24 +0330 SOCK5.6699 00000 user143 112.199.63.119:60682 1.1.1.1:443 475 445 0 CONNECT 1.1.1.1:443
12-02-2022 15:20:37 +0330 SOCK5.6699 00000 user105 191.184.66.98:40102 1.1.1.1:443 12913 18780 0 CONNECT 1.1.1.1:443
12-02-2022 15:20:42 +0330 SOCK5.6699 00000 user143 112.199.63.119:60688 1.1.1.1:443 4530 34717 0 CONNECT 1.1.1.1:443
12-02-2022 15:20:44 +0330 SOCK5.6699 00000 user127 212.167.145.49:2972 1.1.1.1:443 827 267 0 CONNECT 1.1.1.1:443
my goal is to extract two portions of this log file:
- Username
- IP address of the user source
below is a sample of the portions of data needed.
12-02-2022 15:18:22 +0330 SOCK5.6699 00000 user144 97.251.107.125:38605 1.1.1.1:443 51766 169369 0 CONNECT 1.1.1.1:443
So I wrote a Python script to extract both items and store them in separate lists and then joined them with zip function.
import pprint
import collections
iplist=[]
for l in data:
ip_port=l[53:71]
iplist.append(ip_port.split(':')[0])
userlist=[]
for u in data:
user=u[42:52]
userlist.append(user.replace(" ", ""))
a=list(zip(iplist,userlist))
most_ip=collections.Counter(a).most_common(5)
pprint.pprint(most_ip)
This code works fine, and I'm able to get the top used ip with its corresponding username.
Also need to mention that I didn't use re module, since it was listing the second IP (destination IP which is 1.1.1.1- which I don't care about it)
Question:
Is there any other way(more neat wey) than the way I've written the code?