finding sub domains with search engines

Posted: May 5, 2013 in general, security
Tags: , , , ,

Finding sub domains using DNS is common practice, for example fierce does a pretty nice job. Additionally fierce presents a nice overview of the possible ranges that belong to your target. For some odd reason I also like to find sub domains using search engines, even though this will deliver results that are far from exhaustive. In the past I wrote a perl script to do this, but since I’m becoming a fan of python I decided to rewrite it in python. For example using python-requests and beautifulsoup it only takes like ~10 lines to scrape the sub domains from a search engine page:

def getgoogleresults(maindomain,searchparams):
    regexword = r'(http://|https://){0,1}(.*)' + maindomain.replace('.','\.')
    try:
        content = requests.get(googlesearchengine,params=searchparams).content
    except:
        print >> sys.stderr, 'Skipping this search engine'
        return
    soup = BeautifulSoup(content)
    links = soup.find_all('cite')
    extract = re.compile(regexword)
    for i in links:
        match = extract.match(i.text)
        if match:
            res = match.group(2).strip() + maindomain
            if res not in subdomains:
                subdomains.append(res)

This script doesn’t parse all the result pages from the search engines. Actually it only parses the first page. This is because I wanted to keep it simple for the moment being and it helps to not get blocked that quickly. To compensate for the lack of crawling the results, the script uses multiple search engines and negates the results from one engine onto another.  For example it performs queries like:

site:somedomain.tld -site:subdomain1.somedomain.tld

As said it compensates somewhat for the lack of crawling the results pages but it will surely fail to find all sub domains indexed on the search engines. This is how it looks like:

searchsubdomain.py hacktalk.net
blog.hacktalk.net
leaks-db.hacktalk.net
ns2.hacktalk.net
www.hacktalk.net

Which is exactly the moment when I realised I’d also would like the ip addresses that belong to the found domains. I wrote a separate script for that which uses the adns python bindings. This is how it looks like:

searchsubdomain.py hacktalk.net | dnsresolver.py 
ns2.hacktalk.net 209.190.32.59
www.hacktalk.net 209.190.32.59
leaks-db.hacktalk.net 209.190.32.59
blog.hacktalk.net 209.190.32.59

If you wonder why I wrote a new script that uses adns:

real 0m46.962s
user 0m0.904s
sys 0m0.180s

That’s the time it took to resolve 2280 hosts including a couple of 3 second delays to not hog the DNS server. Also for tasks like this (brute forcing sub domains with DNS) bash is your friend:

for i in `cat hosts.txt`;do echo $i”.hacktalk.net” >> hacktalkdomains.txt;done
dnsresolver.py hacktalkdomains.txt | grep -vi resverror

I copied the two scripts to my /usr/local/bin directory to be able to use them from anywhere on the cli. You can find them over here: https://github.com/DiabloHorn/DiabloHorn/tree/master/misc

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s