python – DiabloHorn

Parsing atop files with python dissect.cstruct

Like you’ve probably read, Fox-IT released their incident response framework called dissect, but before that they released the cstruct part of their framework. Ever since they released it publicly I’ve been wanting to find an excuse to play with it on public projects. I witnissed the birth of cstruct back when I was still working at Fox-IT and am very happy to see it all has finally been made public, it sure has evolved since I had a look at the very first version! Special thanks to Erik Schamper (@Schamperr) for answering late night questions about some of the inner workings of dissect.cstruct.

This is one of those things that you can encounter during your incident response assignment and for which life is a bit easier if you can just parse the binary file format with python. Since with incident response you never know in which format exactly you want to receive the data for analysis or what you are looking for it really helps to work with tools that can be rapidly adjusted. python is an ideal environment to achieve this. An added benefit of parsing the structures ourselves with python is that we can avoid string parsing and thus avoid confusion and mistakes.

The atop tool is a performance monitoring tool that can write the output into a binary file format. The creator explains it way better than I do:

Atop is an ASCII full-screen performance monitor for Linux that is capable of reporting the activity of all processes (even if processes have finished during the interval), daily logging of system and process activity for long-term analysis, highlighting overloaded system resources by using colors, etc. At regular intervals, it shows system-level activity related to the CPU, memory, swap, disks (including LVM) and network layers, and for every process (and thread) it shows e.g. the CPU utilization, memory growth, disk utilization, priority, username, state, and exit code.
In combination with the optional kernel module netatop, it even shows network activity per process/thread.
The atop tool website

Like you can imagine, having the above information is of course a nice treasure throve to find during an incident response, even if it is based on a pre-set interval. For the most basic information, you can at least extract process executions with their respective commandlines and the corresponding timestamp.

Since this is an open source tool we can just look at the structure definitions in C and lift them right into cstruct to start parsing. The atop tool itself offers the ability to parse written binary files as well, for example using this commend:

atop -PPRG -r <file>

For the rest of this blog entry we will look at parsing atop binary log files with python and dissect.cstruct. Mostly intended as a walkthrough of the thought process as well.

You can also skip reading the rest of this blog entry and jump to the code if you are impatient or familiar with similar thought processes.

Generating network connection information for experimentation purposes

In one of my last blogs I talked about visualizing firewall data for the purpose of analyzing the configuration and potentially identify security issues. As usual you can skip directly to the tool on my github, or keep on reading.

I wanted to continue playing with this approach to see how it could be improved from a fairly static tool, to a more graph database like approach. However, it turns out that it is somewhat difficult to obtain public firewall configuration files to play with. This is a similar problem to people doing machine learning in cybersecurity where obtaining datasets is still a bit of a challenge.

I decided to write a tool to generate this connection information and at the same time play as well as learn some things which I usually never bother with during development of proof-of-concept projects. So this time I decided to actually document my code, use type annotation and type hints as well as write some unit tests using pytest and actually figure out how argparse sub-commands work.

The tool intends to eventually offer the following options, but for now it only offers the plain option:

python generator_cli.py
usage: generator_cli.py [-h] [--debug] [--verbose] [--config CONFIG] [--mode {inner,outer,all}] {plain,time,apps,full} ...

Generate network connection with a varying level of metadata

options:
  -h, --help            show this help message and exit
  --debug               set debug level
  --verbose             set informational level
  --config CONFIG       Configuration file
  --mode {inner,outer,all}
                        Generate only inner vlan, outer vlan or all connections

Available sub-commands:
  {plain,time,apps,full}
                        Generate connection dataset with different levels of metadata
    plain               Only ip,src,ports
    time                Adds timestamp within desired range
    apps                Adds application details per connection
    full                Generates connections with timestamps & application information

Thanks for giving this a try! --DiabloHorn

The plain option generates the bare minimum of connection information:

{'srchost': '219.64.120.76', 'dsthost': '68.206.89.177', 'srcport': 64878, 'dstport': 3389}
{'srchost': '219.64.120.13', 'dsthost': '68.206.89.162', 'srcport': 63219, 'dstport': 3389}
{'srchost': '92.9.15.58', 'dsthost': '118.220.234.59', 'srcport': 49842, 'dstport': 3389}
{'srchost': '92.9.15.62', 'dsthost': '118.220.234.216', 'srcport': 57969, 'dstport': 445}

The main concept of the tool is that you can define VLAN names and some options and based on that information inner and outer connections for those VLANs are then generated. The --mode parameter controls which type of connections it will generate. The inner mode will only generate connections within the VLAN, the outer mode will generate only connections from the VLAN to other VLANs and the all mode will generate both.

I hope, but don’t promise, to eventually implement the other subcommands time for the generation of connection info within a defined time range (each connection being timestamped) and apps to generate connection info linked to applications like chrome, spotify, etc.

The following set of commands illustrate how you can use this tool to generate pretty pictures with yED

python generator_cli.py plain | jq '[.srchost,.dsthost,.dstport] | join(",")'

Which will output something along the lines of this, which after converting to an Excel document you can import with yED:

139.75.237.238,127.17.254.69,389
139.75.237.123,127.17.254.147,389
139.75.237.243,127.17.254.192,80
139.75.237.100,127.17.254.149,389

The featured image of this blogs shows all of the generated nodes, the following image provides details of one of those generated collection of nodes:

Details of a single collection of generated nodes

Secure slack bot; An exercise in threat modeling

secure, that’s one of those words that is capable of triggering a (usually negative) physical reaction with most people working in the security industry. Thing is, whenever someone claims secure, they usually forget to mention against what kind of threat(s) it is secure. So every once in a while I like to attempt to build something that is secure against a chosen threat model, just for the fun of the mental workout.

This blog will be about the exercise of performing a threat model of a slack bot I might build. It will not contain instructions on how to implement it, it will just be my train of thought while doing a threat model for the solution I want to build.

Most of the times it ends in the project not being finished or if I finish it people point out all kind of security issues in the solution. The latter being the main reason that I like doing these type of projects, since I’ve come to realize that somehow when you are designing a secure solution on your own, you will always end up with blind spots. While if you where to look at the same solution without building it you’d be spotting those exact same security issues. Thus you learn a lot from attempting to build a secure solution and have some else shoot some nice holes in it.

This time I decided to build a simple slack bot that would be capable of receiving a URL to an online Youtube video and download it for offline consumption. After some thinking I came to the following definition of the slack bot being secure:

- Hard target to casual and opportunistic attackers
- Hard target for memory corruption vulnerabilities
- When breached, constraint the attacker to pre-defined resources

So basically I want the solution to be secure against a curious user that uses the bot and decides he wants to hack it for the lulz. In addition when the attacker succeeds, I want that the attacker is only able to view / modify the information that I consider expendable. You’ll notice that I’m saying ‘when the attacker succeeds’ and not ‘if the attacker succeeds’. This is due to the fact that I always assume it will be breached, thus forcing myself to answer the question(s): “what’s the impact? can I accept it? if not, what should I mitigate?”. The other reason is of course that I’m a terrible sysadmin, and I expect myself to forget to patch stuff :( Besides the security requirements I also wanted to learn something new, so I decided I wanted to develop the bot using go.

So how do you proceed to design something with the above requirements? Normally I just perform a threat model-ish approach whereby I mentally think of the assets, attacks and the corresponding security controls to mitigate those attacks, sometimes with the aid of a whiteboard. This time however I decided to give the more formal drawing of a threat model a go. So i searched around, found this awesome blog and after a short while of (ab)using draw.io I ended up with the following result:

Let’s dive into this diagram and see how to further improve the security controls or security boundaries.

TL;DR Threat modeling is a fun and useful mental exercise and aids in spotting potential attacks you might forget to secure against. Also it is 2019, we should be using seccomp and apparmor or similar technologies much more frequent.

Continue reading “Secure slack bot; An exercise in threat modeling”

Introduction to analysing full disk encryption solutions

I’ve written a couple of times on the subject of boot loaders and full disk encryption, but I haven’t really explored it in more detail. With this blog post I hope to dive a bit deeper into how to actually start performing these type of analysis and why they are useful to perform. I’ll start with the usefulness first and then go into the part on how to do it, but will not be fully reversing a disk encryption boot loader. I won’t be doing a lot of hard-core reversing like finding vulnerabilities within the cryptographic operations or reversing custom filesystem implementations, but hopefully provide enough information to get started in the area of reversing unknown boot loaders.

The type of products with which you can use the approaches and techniques described in this blog post are the most useful when applied to full disk encryption (FDE) solutions that are configured to not require pre-boot authentication. The reason being, that you then could potentially obtain the disk decryption key. If the solution requires pre-boot authentication, the information that you can obtain, might be reduced to meta-data or ‘deleted’ files. Which brings us to the whole, why are these type of analysis useful?

The reason of why this is useful, I didn’t fully realise until a couple of years ago when a colleague introduced me to the wonders of all the (hidden) information that FDE solutions may contain. Let’s look at the type of information that you may encounter while investigating these solutions:

(encrypted) Hidden file systems
(obfuscated) Encryption keys
Usernames
(hashed/encrypted) Passwords
Windows domain credentials
Configuration information of the FDE solution
Files marked for deletion
Finding 0days and bypassing encryption

Based on the above list of items we can pretty much conclude that analysing FDE solutions is useful from an offensive as well as from a defensive point of view. It can either help us to breach a target network or obtain sensitive information as well as collect forensic evidence or aid us into understanding the specific cryptographic implementation to enable us to decrypt the disk and analyse it. The helper tools I’ve used in this blog post can be found here. Keep on reading if you want to know the rest of all the details and the process I usually follow. I’ll try to describe the following steps:

Creating a (partial) copy of the disk
Analysing the disk
Static & dynamic boot analysis

Since I don’t have easy access to disk encryption software with the exact features I’d like to analyse I’ll be using DiskCryptor as an example product.

For some reason it seems that the products with the most interesting features to reverse engineer have a horrendous ‘request trial’ process as well as not providing trials to a random researcher on the internet :( sad panda :(

The other reason to use DiskCryptor is the fact that it is open source, thus enabling people that want to get started with type of stuff to more easily understand difficult snippets of assembly. My personal approach to a lot of reversing challenges usually revolves around finding a similar open source variant first or finding the open source components used in the proprietary solution if applicable. Reason being that it makes your life a lot easier to understand not only general concepts, but also specific code quirks. A very nice explanation on finding as much information as possible before your start reversing is given by Alex Ionescu in his offensive con keynote ‘Reversing without reversing’.

Oh and there is no specific goal, besides just explaining my general thought process. As a side note I am no reverse engineering expert, so feel free to correct me :-)

Continue reading “Introduction to analysing full disk encryption solutions”

Identify a whitelisted IP address

An IP whitelist is one of the many measures applied to protect services, hosts and networks from attackers. It only allows those that are on the IP whitelist to access the protected resources and all others are denied by default. As attackers we have multiple obstacles to overcome if we want to bypass this and not always will it be possible. In my personal opinion there are two situation in which you will end up as an attacker:

You are NOT on the same network as your target
You are on the same network as your target

In the first situation you will (generally speaking) not be able to access or influence the network traffic of your target. This in turn enables the TCP/IP mechanisms to be useful and prevent you from accessing the resources, although maybe not prevent you from discovering who is on the whitelist.

In the second situation you will (generally speaking) be able to access or influence the network traffic of your target. This enables us as attacker to identify as well as bypass IP restrictions, by manipulating the TCP/IP protection mechanisms, to gain access to the protected resources.

For both situations there is an often overlooked detail which is: how do you know which IPs are on the whitelist? Mostly it is just assumed that either you know that upfront or discover that due to a connection being active while you initiate your attack. In this blog posts we’ll discuss the two situations and describe the techniques available to identify IPs on whitelist which have no active connection. A small helper script can be found here.

Continue reading “Identify a whitelisted IP address”

Brute forcing encrypted web login forms

There are a ton of ways to brute force login forms, you just need to google for it and the first couple of hits will usually do it. That is of course unless you have Burp in which case it will be sufficient for most of the forms out there. Sometimes however it will not be so straight forward and you’ll need to write your own tool(s) for it. This can be for a variety of reasons, but usually it boils down to either a custom protocol over HTTP(S) or some custom encryption of the data entered. In this post we are going to look at two ways of writing these tools:

Your own python script
A Greasemonkey script

Since to write both tools you first need to understand and analyse the non-default login form let’s do the analysis part first. If you want to follow along you’ll need the following tools:

Python
Burp free edition
Firefox with the Greasemonkey plugin
FoxyProxy
FireFox developer tools (F12)

Please note that even though we are using some commercially available software as an example, this is NOT a vulnerability in the software itself. Most login forms can be brute forced, some forms slower than others ;) As usual you can also skip the blog post and directly download the python script & the Greasemonkey script. Please keep in mind that they might need to be adjusted for your own needs.

Continue reading “Brute forcing encrypted web login forms”

Quantum Insert: bypassing IP restrictions

By now everyone has probably heard of Quantum Insert NSA style, if you haven’t then I’d recommend to check out some articles at the end of this post. For those who have been around for a while the technique is not new of course and there have been multiple tools in the past that implemented this type of attack. The tools enabled you to for example fully hijack a telnet connection to insert your own commands, terminate existing connections or just generally mess around with the connection. Most of the tools relied on the fact that they could intercept traffic on the local network and then forge the TCP/IP sequence numbers (long gone are the days that you could just predict them).

So it seems this type of attack, in which knowing the sequences numbers aids in forging a spoofed packet, has been used in two very specific manners:

Old Skool on local networks to inject into TCP streams
NSA style by globally monitoring connections and injecting packets

There is a third option however that hasn’t been explored yet as far as i know, which is using this technique to bypass IP filters for bi-directional communication. You might wonder when this might come in handy right? After all most of the attackers are used to either directly exfiltrate through HTTPS or in a worst case scenario fall back to good old DNS. These methods however don’t cover some of the more isolated hosts that you sometimes encounter during an assignment.

During a couple of assignments I encountered multiple hosts which were shielded by a network firewall only allowing certain IP addresses to or from the box. The following diagram depicts the situation:

As you can see in the above diagram, for some reason the owner of the box had decided that communication with internet was needed, but only to certain IP addresses. This got me thinking on how I could exfiltrate information. The easiest way was of course to exfiltrate the information in the same way that I had obtained access to the box, which was through SSH and password reuse. I didn’t identify any other methods of exfiltration during the assignment. This was of course not the most ideal way out, since it required passing the information through multiple infected hops in the network which could attract some attention from the people in charge of defending the network.

A more elegant way in my opinion would have been to directly exfiltrate from the machine itself and avoid having a continuous connection to the machine from within the network. In this post we are going to explore the solution I found for this challenge, which is to repurpose the well known quantum insert technique to attempt and build a bi-directional communication channel with spoofed IP addresses to be able to exfiltrate from these type of isolated hosts. If you are thinking ‘this only works if IP filtering or anti address spoofing is not enforced’ then you are right. So besides the on going DDOS attacks, this is yet another reason to block outgoing spoofed packets.

If you are already familiar with IP spoofing, forging packets and quantum insert you can also skip the rest of this post and jump directly to QIBA – A quantum insert backdoor POC. Please be aware that I only tested this in a lab setup, no guarantees on real world usage :)

Lastly as you are probably used to by now, the code illustrates the concept and proofs it works, but it’s nowhere near ready for production usage.

Continue reading “Quantum Insert: bypassing IP restrictions”

Python raw sockets sniffing & pcap saving

Even though we are pretty used to it, libpcap is not always present on systems. Usually, regardless of your goal, looking at traffic is actually pretty useful. In my experience this applies to offensive (pentesting, red team) work as well as defensive (incident response, network monitoring) work.

One of the first things that comes to mind, when libpcap is not available, is of course raw sockets, since these seem to be always available as long as you have the correct privileges. I’ve written previously about them as well as created some POC for backdoor purposes. Up until now raw sockets haven’t failed me, so when during a recent assignment I had to sniff traffic without libpcap I decided to write some Python code to achieve this. In case you are wondering, yes this was to further gather juicy information from unencrypted protocols like telnet, http and ftp.

A script nowadays never starts without a quick google query to save yourself the trouble of writing everything from scratch. So even though I enjoy writing a lot of things from scratch to learn, in this case I mainly adjusted an excellent example script from: http://askldjd.com/2014/01/15/a-reasonably-fast-python-ip-sniffer/

Adjusting the above script to save the data in pcap format was an easy undertaking and immediately useful. After waiting for a couple of minutes I got myself a nice pcap file which I could analyse on another machine with regular tools like tcpdump or wireshark.

You can find the script on the following gist

[python] Poor man’s forensics

So after a period of ‘lesser technical times’ I finally got a chance to play around with bits, bytes and other subjects of the information security world. A while back I got involved in a forensic investigation and participated with the team to answer the investigative questions. This was an interesting journey since a lot of things peeked my interest or ended up on one of my todo lists.

One of the reasons that my interest was peeked is that yes, you can use a lot of pre-made tools to process the disk images and after that processing is done you can start your investigation. However, there are still a lot of questions you could answer much quicker if you had a subset of that data available ‘instantly’. The other reason is that not all the tools understand all the filesystems out there, which means that if you encounter an exotic file system your options are heavily reduced. One of the tools I like and which inspired me for these quick & dirty scripts is ‘mac-robber‘ (be aware that it changes file times if the destination is not mounted read-only) since it’s able to process any file system as long as it’s mounted on an operating system on which mac-robber is able to run. An example of running mac-robber:

sudo mac-robber mnt/ | head
class|host|start_time
body|devm|1471229762
MD5|name|inode|mode_as_string|UID|GID|size|atime|mtime|ctime|crtime
0|mnt/.disk|0|dr-xr-xr-x|0|0|2048|1461191363|1461191353|1461191353|0
0|mnt/.disk/base_installable|0|-r–r–r–|0|0|0|1461191363|1461191316|1461191316|0
0|mnt/.disk/casper-uuid-generic|0|-r–r–r–|0|0|37|1461191363|1461191353|1461191353|0

You can even timeline the output if you want with mactime:

sudo mac-robber mnt/ | mactime -d | head
Date,Size,Type,Mode,UID,GID,Meta,File Name
Thu Jan 01 1970 01:00:00,2048,…b,dr-xr-xr-x,0,0,0,”mnt/.disk”
Thu Jan 01 1970 01:00:00,0,…b,-r–r–r–,0,0,0,”mnt/.disk/base_installable”
Thu Jan 01 1970 01:00:00,37,…b,-r–r–r–,0,0,0,”mnt/.disk/casper-uuid-generic”
Thu Jan 01 1970 01:00:00,15,…b,-r–r–r–,0,0,0,”mnt/.disk/cd_type”
Thu Jan 01 1970 01:00:00,60,…b,-r–r–r–,0,0,0,”mnt/.disk/info”

Now that’s pretty useful and quick! One of the things I missed however was the ability to quickly extend the tools as well as focus on just files. From a penetration testing perspective I find files much more interesting in an forensic investigation than directories and their meta-data. This is of course tied to the type of investigation you are doing, the goal of the investigation and the questions you need answered.

I decided to write a mac-robber(ish) python version to aid me in future investigations as well as learning a thing or two along the way. Before you continue reading please be aware that:

The scripts have not gone through extensive testing
Thus should not be blindly trusted to produce forensically sound output
The regular ‘professional’ tools are not perfect either and still contain bugs ;)

That being said, let’s have a look at the type of questions you can answer with a limited set of data and how that could be done with custom written tools. If you don’t care about my ramblings, just access the Github repo here. It has become a bit of a long article, so here are the ‘chapters’ that you will encounter:

What data do we want?
How do we get the data?
Working with the data, answering questions
1. Converting to body file format
2. Finding duplicate hashes
3. Permission issues
4. Entropy / file type issues
Final thoughts

Continue reading “[python] Poor man’s forensics”

Parsing the hiberfil.sys, searching for slack space

Implementing functionality that is already available in an available tool is something that has always taught me a lot, thus I keep on doing it when I encounter something I want to fully understand. In this case it concerns the ‘hiberfil.sys’ file on Windows. As usual I first stumbled upon the issue and started writing scripts to later find out someone had written a nice article about it, which you can read here (1). For the sake of completeness I’m going to repeat some of the information in that article and hopefully expand upon it, I mean it’d be nice if I could use this entry as a reference page in the future for when I stumble again upon hibernation files. Our goal for today is going to be to answer the following question:

What’s a hiberfil.sys file, does it have slack space and if so how do we find and analyze it?

To answer that question will hopefully be answered in the following paragraphs; we are going to look at the hibernation process, hibernation file, it’s file format structure, how to interpret it and finally analyze the found slack space. As usual you can skip the post and go directly to the code.

Hibernation process

When you put your computer to ‘sleep’ there are actually several ways in which it can be performed by the operating system one of those being the hibernation one. The hibernation process puts the contents of your memory into the hiberfil.sys file so that the state of all your running applications is preserved. By default when you enable hibernation the hiberfil.sys is created and filled with zeros. To enable hibernation you can run the following command in an elevated command shell:

powercfg.exe -H on

If you want to also control the size you can do:

powercfg.exe -H -Size 100

An interesting fact to note is that Windows 7 sets the size of the hibernation file size to 75% of your memory size by default. According to Microsoft documentation (2) this means that hibernation process could fail if it’s not able to compress the memory contents to fit in the hibernation file. This of course is useful information since it indicates that the contest of the hibernation file is compressed which usually will make basic analysis like ‘strings’ pretty useless.

if you use strings always go for ‘strings -a <inputfile>’ read this post if you are wondering why.

The hibernation file usually resides in the root directory of the system drive, but it’s not fixed. If an administrators wants to change the location he can do so by editing the following registry key as explained by this (3) msdn article:

Key Name: HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\
Value Name: PagingFiles
Type: REG_MULT_SZ
Data: C:\pagefile.sys 150 500
In the Data field, change the path and file name of the pagefile, along with the minimum and maximum file size values (in megabytes).

So if you are performing an incident response or forensic investigation make sure you check this registry key before you draw any conclusion if the hiberfil.sys file is absent from it’s default location. Same goes for creating memory images using hibernation, make sure you get the 100% and write it to a location which doesn’t destroy evidence or where the evidence has already been collected.

Where does the slack space come from you might ask? That’s an interesting question since you would assume that each time the computer goes into hibernation mode it would create a new hiberfil.sys file, but it doesn’t. Instead it will overwrite the current file with the contents it wants to save. This is what causes slack space, since if the new data is smaller in size than the already available files the data at the end of the file will still be available even if it’s not referenced by the new headers written to the file.

From a forensic standpoint that’s pretty interesting since the unreferenced but available data might contain important information to help the investigation along. If you are working with tools that automatically import / parse or analyse the hiberfil.sys file you should check / ask / test how they handle slack space. In a best case scenario they will inform you about the slack space and try to recover the information, in a less ideal scenario they will inform you that there is slack space but it’s not able to handle the data and in the worst case scenario it will just silently ignore that data and tell you the hibernation file has been processed successfully.

Continue reading “Parsing the hiberfil.sys, searching for slack space”

Writing your own blind SQLi script

We all know that sqlmap is a really great tool which has a lot of options that you can tweak and adjust to exploit the SQLi vuln you just found (or that sqlmap found for you). On rare occasions however you might want to just have a small and simple script or you just want to learn how to do it yourself. So let’s see how you could write your own script to exploit a blind SQLi vulnerability. Just to make sure we are all on the same page, here is the blind SQLi definition from OWASP:

Blind SQL (Structured Query Language) injection is a type of SQL Injection attack that asks the database true or false questions and determines the answer based on the applications response.

You can also roughly divide the exploiting techniques in two categories (like owasp does) namely:

content based
- The page output tells you if the query was successful or not
time based
- Based on a time delay you can determine if your query was successful or not

Of course you have dozens of variations on the above two techniques, I wrote about one such variation a while ago. For this script we are going to just focus on the basics of the mentioned techniques, if you are more interested in knowing how to find SQLi vulnerabilities you could read my article on Solving RogueCoder’s SQLi challenge. Since we are only focusing on automating a blind sql injection, we will not be building functionality to find SQL injections.

Before we even think about sending SQL queries to the servers, let’s first setup the vulnerable environment and try to be a bit realistic about it. Normally this means that you at least have to login, keep your session and then inject. In some cases you might even have to take into account CSRF tokens which depending on the implementation, means you have to parse some HTML before you can send the request. This will however be out of scope for this blog entry. If you want to know how you could parse HTML with python you could take a look at my credential scavenger entry.

If you just want the scripts you can find them in the example_bsqli_scripts repository on my github, since this is an entry on how you could write your own scripts all the values are hard coded in the script.

Continue reading “Writing your own blind SQLi script”

finding sub domains with search engines

Finding sub domains using DNS is common practice, for example fierce does a pretty nice job. Additionally fierce presents a nice overview of the possible ranges that belong to your target. For some odd reason I also like to find sub domains using search engines, even though this will deliver results that are far from exhaustive. In the past I wrote a perl script to do this, but since I’m becoming a fan of python I decided to rewrite it in python. For example using python-requests and beautifulsoup it only takes like ~10 lines to scrape the sub domains from a search engine page:

def getgoogleresults(maindomain,searchparams):
    regexword = r'(http://|https://){0,1}(.*)' + maindomain.replace('.','\.')
    try:
        content = requests.get(googlesearchengine,params=searchparams).content
    except:
        print >> sys.stderr, 'Skipping this search engine'
        return
    soup = BeautifulSoup(content)
    links = soup.find_all('cite')
    extract = re.compile(regexword)
    for i in links:
        match = extract.match(i.text)
        if match:
            res = match.group(2).strip() + maindomain
            if res not in subdomains:
                subdomains.append(res)

This script doesn’t parse all the result pages from the search engines. Actually it only parses the first page. This is because I wanted to keep it simple for the moment being and it helps to not get blocked that quickly. To compensate for the lack of crawling the results, the script uses multiple search engines and negates the results from one engine onto another. For example it performs queries like:

site:somedomain.tld -site:subdomain1.somedomain.tld

As said it compensates somewhat for the lack of crawling the results pages but it will surely fail to find all sub domains indexed on the search engines. This is how it looks like:

searchsubdomain.py hacktalk.net
blog.hacktalk.net
leaks-db.hacktalk.net
ns2.hacktalk.net
www.hacktalk.net

Which is exactly the moment when I realised I’d also would like the ip addresses that belong to the found domains. I wrote a separate script for that which uses the adns python bindings. This is how it looks like:

searchsubdomain.py hacktalk.net | dnsresolver.py 
ns2.hacktalk.net 209.190.32.59
www.hacktalk.net 209.190.32.59
leaks-db.hacktalk.net 209.190.32.59
blog.hacktalk.net 209.190.32.59

If you wonder why I wrote a new script that uses adns:

real 0m46.962s
user 0m0.904s
sys 0m0.180s

That’s the time it took to resolve 2280 hosts including a couple of 3 second delays to not hog the DNS server. Also for tasks like this (brute forcing sub domains with DNS) bash is your friend:

for i in `cat hosts.txt`;do echo $i”.hacktalk.net” >> hacktalkdomains.txt;done
dnsresolver.py hacktalkdomains.txt | grep -vi resverror

I copied the two scripts to my /usr/local/bin directory to be able to use them from anywhere on the cli. You can find them over here: https://github.com/DiabloHorn/DiabloHorn/tree/master/misc

Quick tiny python web proxy

Python just keeps amazing me, the following code is all you need to have a proxy up and running in like 10 seconds

from flask import Flask
from flask import request

import requests

app = Flask(__name__)


hosttorequest = 'www.cnn.com'

@app.route('/')
def root():
    r = requests.get('http://'+hosttorequest+'/')
    return r.content

@app.route('/<path:other>')
def other(other):
    r = requests.get('http://'+hosttorequest+'/'+other)
    return r.content
    
if __name__ == '__main__':
    app.run(host='0.0.0.0', port=80)

Now this sure makes it easy to start hiding some stuff in there. To get it up and running just do: sudo python filename.py

Credential Scavenger

Just because it’s discarded it doesn’t mean it’s useless. Nowadays it doesn’t really matter which Google dork you use, but you’ll always hit some username/password dump. There are some nice tools out there to monitor pastebin (or any of the alternatives) for example:

But then what? you scraped/monitored or just F5’ed the website and are now sitting on a nice pile of potentially interesting information. You could of course try it out and see if it contained any working samples…chances are those are long gone by now. Luckily for us, we all know that people tend to reuse their password on multiple websites. So all we have to do is check their username on multiple (known) services and see if they have forgotten to change their password on any of them. Since it’s been a while that I’ve coded in python I decided to use python for the job. After all it seemed like fun to write something that could maybe remotely resemble a framework-thingie. It was easier then I thought, had to rewrite it a couple of times though due to poor design choices. Not saying it’s great now, but at least it seems to be able to perform all the tasks I’d like to have.

Now since this is just a POC for an idea, it’s non optimized, non threaded and non-usable for serious harvesting and testing of large amounts of data. Remember that in most countries it’s illegal to use someone else his credentials. For the development I’ve just created some testing accounts and tested it on them to see if the idea was viable and produced any results.
The core of the whole thing is like a couple of lines to dynamically load up the module classes:

def loadmodules(modulepath,configfile):
    """load modules & create class instances, returns a dictionary.

    Return dictionary is of the form: 
        :
    """
    ccc = parseconfig(configfile)
    loadedmodules = dict()
    for key in ccc:
        modulefilename = key
        if not key in loadedmodules:
            #load the module based on filename
            tempmodule = imp.load_source(modulefilename, "%s%s.py" % (modulepath,modulefilename))
            #find the class
            moduleclass = getattr(tempmodule,modulefilename.title())
            #instantiate the class
            moduleinstance = moduleclass()
            loadedmodules[key] = moduleinstance
    return loadedmodules

Then some basic ‘library’ functionality is provided on per protocol basis, at the moment it includes some ‘libs’ for imap, pop3 and HTTP forms and a small module for some sqlite DB operations. The whole thing can then be used as one pleases, either by building on top of it or by using the provided ‘simple_scavenger.py’ example. When you run the provided example it, it provides output on the CLI:

./simple_scavenger.py ../creds.txt 
{'hotmail': ['usernamehere', 'passwordhere', 'pop3'], 'yahoo': ['usernamehere', 'passwordhere', 'imap']}
{'linkedin': ['usernamehere', 'passwordhere', 'httpform'], 'gmail': ['usernamehere', 'passwordhere', 'imap']}

and stores it in the DB for easy retrieval:

sqlite3 creds.db "select * from creds"
usernamehere|passwordhere|pop3|hotmail|1353450068
usernamehere|passwordhere|imap|yahoo|1353450068
usernamehere|passwordhere|httpform|linkedin|1353450112
usernamehere|passwordhere|imap|gmail|1353450112

That’s all there is to it.

At the beginning of this post I said I’d build it to hopefully be some kind of framework-thingie, so let’s see how you could expand this to authenticate with the given credentials on another service.

Continue reading “Credential Scavenger”

What’s in a picture?

Most of us are familiar with steganography (stegano) who better to explain it then wikipedia:

Steganography is the art and science of writing hidden messages in such a way that no one, apart from the sender and intended recipient, suspects the existence of the message, a form of security through obscurity.

So who can guess what’s in the following picture:

Continue reading “What’s in a picture?”