I just send this message to you to ask about one problem that I have to solve now. I have a database of email in my linux virtual machine. this table includes some fiedls such as ID, Spam, Data, Time, Sender_add, sender_ip, sender_domain,….
since I do a project on automatic whitelist so the data preprocessing is very important. My problem is that i still dont know how to generate a database for my whitelist from that database because one domain can include many IP addresses. My job is to group them all (by a script maybe).
For example: gmail.com: 18.104.22.168, 22.214.171.124, 126.96.36.199, …..
From those pair of IP-domain, I have to find threshold to figure out which IP is used for sending spam. threshold can be "3 days" (for example) because spammers will just use IP to spread spams in such a short time. after removing the illegal IP, we have final whitelist to apply in email sys
so what i just want to care abt are sender_ip, and sender_domain. And when I use mySQL command to list out the number of rows in the table, the result is more than 46,000 rows >.< (SELECT sender_ip, sender_domain FROM emailsl;) —-> i can not do it manually by see each line and note down the paper "what domain" has "what IP"
That why i just ask u for method to solve this pre-problem. This step in data preprocessing is very important because it creats the DB for my whitelist in any email sys.
What i'm having: email db, linux virtual machine, mySQL
What i want: build db in which show the pairs of sender domain-legal IP
Hope u see my point and help me abt that