Monday, March 26, 2018

Phish Kit Finder

Source article: https://duo.com/blog/phish-in-a-barrel-hunting-and-analyzing-phishing-kits-at-scale

Quite often, a script kiddie that exploits a website will just upload and extract a .zip file containing their phishing kit.  This usually includes .php, .html, image, and sometimes helpful README files.  What they *don't* do is turn off the directory listing feature of the web server.  This results in all their files being viewable to anyone that browses to the root of where they unpacked the zip.

By gathering phishing links, cutting them into parts, and checking each one for .zip files, you can sometimes find the original .zip they used.  Since the .zip is not displayable in the browser, you can download it.  Inside, the .php files usually have a destination mailing address of the attacker along with a subject line and other interesting tidbits, such as hacker group shout-outs, names/versions of the kit, etc...

Here's an example URL: 

http://codecrossroad.blogspot.com/this/is/a/phishing/link.html

It gets cut into the following URLs:

http://codecrossroad.blogspot.com/this/is/a/phishing/
http://codecrossroad.blogspot.com/this/is/a/
http://codecrossroad.blogspot.com/this/is/
http://codecrossroad.blogspot.com/this
http://codecrossroad.blogspot.com/


Each one is scraped, and checked for "<something>.zip" in the results.  If it finds it, it'll attempt to download at the initial path + the name of the .zip.  Ex:

http://codecrossroad.blogspot.com/this/is.zip


Direct attempts are also attempted at each split URL.  Ex:

http://codecrossroad.blogspot.com/this/is/a/phishing.zip
http://codecrossroad.blogspot.com/this/is/a.zip
etc...


Link shorteners are popular to deliver malicious links in e-mails.  The expander() function attempts to resolve them and then does the URL splitting.

For the initial testing, it's looking like ~10% of websites that I scrape in this method have the .zip uploaded.  Not too bad.


Enough talking, here's the code!


import requests, re, io, zipfile, os

urls = []
f = open("threatUrls_out.csv")
for line in f:
 urls.append(line.strip())
f.close()

def direct(url):
 try:
  r = re.findall('[^\/]+', url) #splits url into all parts
  starter = r[0] + "//" + r[1] + "/" #url = http://<site>/
  for el in range(len(r)-3): 
   starter = starter + r[el+2] + ".zip" #starter = starter + next path level
   data = requests.get(starter, timeout=3)
   if data.status_code==200:
    if data.content[:2]=='PK':
     f = open(r[1]+"_direct.zip", 'wb') #r[1] = domain
     f.write(data.content)
     f.close()
     print "\tDirect zip found and downloaded!"
   starter = starter[:-4]+'/' #removes .zip, adds / for next path level
  return 1
 except Exception as e:
  print "Error in direct(): ", str(e)
  return 0

def regex(url):
 try:
  r = re.findall('[^\/]+', url) #splits url into all parts
  starter = r[0] + "//" + r[1] + '/' #url = http://<site>/
  for el in range(len(r)-3):
   starter = starter + r[el+2]+'/' #adds slash to each path level
   data = requests.get(starter, timeout=3)
   if ".zip" in data.text:  #if there's a .zip anywhere on the html..
    regZipX = re.findall("([^\s<>=\"]+\.zip)", data.text) #find all of them with a regex
    if len(regZipX) > 0: #if it found some, 
     try:
      regZip = set(regZipX) #dedup them
      for zipName in regZip: #for all the .zips...
       data = requests.get(starter+zipName,timeout=3) #try to get them at the path + the regexed zip name
       if data.status_code==200:
        if data.content[:2]=='PK':
         f = open(r[1]+"_regex_"+zipName, 'wb')
         f.write(data.content)
         f.close()
         print "\tRegex Zip found and downloaded!"
     except Exception as e:
      print "Error in regexer...", str(e)
 except Exception as e:
  print "Error in regexer..." + str(e)

def emailParse():
 emails=set()
 path = '/home/me/phishkitfinder/'
 files = os.listdir(path)
 for file in files:
  if '.zip' in file:
   print "opening", file
   hit = re.search('(.*?)_', file)
   domain = hit.group(1)
   zipF = open(path+file)
   zipContent = zipF.read()
   zip = zipfile.ZipFile(io.BytesIO(zipContent), 'r')
   for fileName in zip.namelist():
    f = zip.open(fileName)
    contents = f.read()
    hit = re.findall("([\w_.+-]+\@[\w-]+\.\w+)", contents)
    if len(hit) > 0:
     for h in hit:
      emails.add((domain,h))
    f.close()
   os.rename('/home/me/phishkitfinder/'+file, '/home/me/phishkitfinder/processedKits/'+file)
 print "writing e-mails to file..."
 emailFile = open('emailsHarvested.txt', 'a')
 for email in emails:
  emailFile.write(email[0]+"\t"+email[1]+"\n")
 emailFile.close()

def expander(url):
 try:
  req = requests.get(url, allow_redirects=False, timeout=3)
  if req.status_code==302:
   url = req.headers['Location']
   return url
  else:
   return url
 except Exception as e:
  print "Error in expanderizer! ", str(e)
  return url
 
 
 
emails = set() #unique list of e-mail addresses

max = len(urls)
count = 0
  
for url in urls:
 count = count + 1
 print "on",count,"of",max,"(",url,")"
 if len(url) < 30:
  oldUrl = url
  url = expander(url)
  print "Original URL " + oldUrl + " expanded to " + url
 dir = direct(url) # try direct grabs
 reg = regex(url) # try parsing for .zip's

emailParse() # parse zips for e-mail addresses