Patrik Hudak

Subdomain Takeover: Going for High Impact

Patrik Hudak — Tue, 15 Dec 2020 14:15:33 GMT

When I and other guys in the web application security started posting stuff around subdomain takeover, it has become increasingly hard to find new cases in the public bug bounty programs. There was more competition than ever, but also, cloud providers such as AWS or Heroku started to implement mitigations to prevent subdomain takeovers in the first place. At the same time, bug bounty programs begin to set clear rules for subdomain takeover reports, mostly falling into Medium severity.

For someone like me, who deeply explored the realms of takeovers as well as their implications, this bothered me a lot. Not the fact that I am unable to find more "low hanging fruit", but rather that companies are taking a blindfolded view about the impact of subdomain takeovers in their infrastructure. I thus started to think about every possible escalation you can make to showcase the high impact this problem can have. This post explains my conclusions.

Cloud Provider Matter

Don't get me wrong. I still get a lot of subdomain takeover cases each month. However, as I explained above, most vulnerable cloud providers took a radical step and started to mitigate such problems in different ways:

Disallowing to register a custom domain name under a different account. For instance, if a domain takeover.example.com was registered under account1, the provider doesn't allow to register it under account2
Adding domain verification. In order to add a custom domain, you are required to provide proof that you actually own the domain. This usually takes form in adding TXT records to the DNS which of course you, unfortunately, don't have access to.
Adding entropy. Some services generate a random string that is a required part of the CNAME record in order to prevent takeovers. This is usually a strong enough defense since API rate limits won't allow you to brute force it (or the range of possibilities is really large). This is done for example in AWS ELB.

You should refer to https://github.com/EdOverflow/can-i-take-over-xyz in order to verify whether some provider is vulnerable or not.

In order to plan the possible escalation paths, you also need to know what the cloud provider allows you to do with the takeover domain. Unless you are able to point the domain to your VPS instance, you will pretty much be limited with some functionality. For instance:

Amazon S3 — Allows to host static content, but you won't be able to execute dynamic content such as Python scripts in the backend.
Microsoft Azure CloudApp — Allows specifying dynamic container, however, you don't get the freedom of fully controlling the deployed environment.

The best takeover you can hope for is Microsoft TrafficManager. This allows you to specify your own A records once the victim does a DNS resolution. Once you point it to your VPS instance, you are pretty much unrestricted.

It is important to think about obtaining a valid SSL certificate. When the domain points to your VPS instance, it is straightforward to just request a Let's Encrypt cert using certbot and the full process is done automatically. In cases such as Amazon Elasticbeanstalk, you are not able to install certbot in the environment. You are still able to obtain it manually by creating a valid HTTP endpoint with the specified string and signed nonce as requested by Let's Encrypt validation. Certificates will be an important part of the next sections.

Before going forward, I want to share a pretty interesting story I had when obtaining a certificate for a domain I just took over. The setting was just perfect: I was able to point the domain to my server and went for a high impact. The first thing I tried was obtaining a valid SSL certificate. However, the automatic certbot process failed without any indication of why. My manual effort for Let's Encrypt failed as well. Then I opted for a paid certificate using a provider that accepted HTTP validation. It failed as well. When I tried a third provider, that's where things started to get clear. My request for a certificate was denied with the message that the policy forbids to issue a new certificate for a domain ███.com. It was the first time I came across something like that. You know a company is taking web security seriously if they have this kind of defense-in-depth measures in place.

All of the following escalation paths pretty much go hand in hand. I would assume that the takeover we got is the most liberal (in other words, you are controlling the whole instance). Other restricted environments such as Elasticbeanstalk might require a couple of tweaks to work.

XSS

My first use-case is to showcase a stored XSS on the domain that was taken. The proof-of-concept is as straightforward as creating a new HTML file and pasting the Javascript code in it. Sometimes, stored XSS are valued more than subdomain takeover. For the XSS payload itself, I usually go with alert(document.cookie).

With the XSS possibility, you can also showcase a possible phishing scenario where you clone the login page of the targeted organization and place it in the domain you now "own". I was able to take over a domain that had a form of auth.example.com which is the perfect candidate for phishing attacks.

Account Takeover

Usually, a session cookie for the login session is set to have secure and httpOnly. The former means that you can access cookies only if the page has a valid SSL certificate and the latter says that it cannot be accessed using Javascript, only in HTTP requests to the backend server.

It is fairly common to bound a session cookie to a wildcard domain such as *.example.com. For instance, let's say there is a login gateway on auth.example.com, but there are multiple apps such as customer.example.com, app.example.com that want to be able to access the login session. An SSO solution thus tends to work on the wildcard domain. With the domain you have just taken over, you can access this session cookie easily.

To achieve a full account takeover, you need to force the victim into visiting the domain that was vulnerable to subdomain takeover. Then a simple backend script can retrieve the session cookie (note that this example simply reflects the value to the user but it can be easily modified to store the values):

#python3

import Flask

app = flask.Flask(__name__)

@app.route('/')
def gimme_my_cookies():
    return flask.jsonify(flask.request.cookies)

if __name__ == '__main__':
    app.run(port=80)

Once you have the correct session token, you can change the cookie in the browser and access the victim's account (assuming there is no other security prevention).

CSRF

Similar to the examples above, CSRF can be sometimes done when the target application is assuming that the wildcard domain is safe. This is also true for inter-app communication using postMessage API where the Javascript code is responsible for detecting the sender and based on that allowing the action. The regexes for this verification are usually set for *.example.com.

Authentication Bypass

Rather than explaining it on my own, I would point you to the great report submitted by Arne Swinnen on HackerOne. Although the attack complexity is pretty high, in some cases you can achieve the same behavior.

Hopefully, this shines a new light on how you can think of subdomain takeover bugs in the future. If you know about any other escalation path, I will be more than happy to update the post with the credits.

Until next time!

Patrik
Follow @0xpatrik

Subdomain Enumeration: Filter Wildcard Domains

Patrik Hudak — Mon, 20 Apr 2020 17:51:39 GMT

When doing subdomain enumeration, you are likely to encounter a domain that is a wildcard. Such domains respond to DNS queries with a record/records, which are not explicitly defined in the DNS zone. In other words, if the DNS zone does not hold a record for a particular subdomain, a fallback is made to its wildcard entry.

In the example above, the admin.0xpatrik.com and sub.0xpatrik.com respond with 1.2.3.4 and 4.3.2.1 respectively, while any other subdomain would respond with 1.1.1.1, not NXDOMAIN! This is a valid, yet obscure behavior of DNS servers.

Such a function can cause a lot of trouble because you will start enumerating targets, that doesn't even exist.

This post presents techniques to detect and filter the wildcard domains during subdomain enumeration.

Detection

One of the simplest indicators of the wildcard domain is a large number of its valid subdomains. The simplest way to confirm this case is by running a DNS query on subdomains that you are certain that does not exist as such:

As you can see, this query resulted in valid results! This is the first hint that you are probably dealing with the wildcard domain. To verify that this is indeed a wildcard domain, you can run one more test to confirm it:

Schema starting with a * (*.example.com) is used to denote wildcard entry. As you can see this resulted in another valid record set in the ANSWER section.

Perfect, you are now able to recognize wildcard domains. The real world is, however, a little more difficult than that. There is no rule, that the wildcard domain needs to be under the root domain. Take for example *.amazonaws.com and *.s3.amazonaws.com. As you will see after running the queries for these domains, only the latter responds with a valid record. Therefore, you need to recognize to which root or subdomain is a wildcard domain applied.

Filtration

We should be able to correctly filter out the non-existing domains from the existing ones. With that in our hands, we can continue with the recon and enumeration as there is no wildcard domain in the first place. However, this can't be done in all cases:

Some configurations allow wildcard entries but do not respond to *.example.com query.
Some configurations allow all subdomains to resolve to one domain and different content is served based on Host HTTP header (like *.herokuapp.com).

With this in mind, we can create a heuristic for filtering the non-existing subdomains under the root that allow wildcard entries like so:

Given the domain (something.sub.example.com), obtain its records (like 1.1.1.1)
Replace the highest subdomain level with * and try to resolve it (*.sub.example.com)
If you get no results, continue until you reach the root domain (*.example.com)
If you end up with no results, there is no wildcard domain and the given domain in step one is likely valid.
If you get any response from wildcard queries, compare it to the results you got from 1. If these results equal, the domain is likely not valid.
If these do not equal, the domain is likely valid.

It is important to compare the results with the wildcard results in higher domains. For instance, if both *.sub.example.com and *.example.com respond with valid results, you compare the results for *.sub.example.com.

I deliberately say records, because there are instances when wildcard queries respond with multiple records (e.g. *.herokuapp.com). When implementing automated filtering, you should compare sets instead of lists while many of the records can be unordered.

The last issue I want to talk about is anycast DNS. I have seen cases where the wildcard domain responds with multiple values, however, these are presented one at the time. In this case, automating the filtering is really tough as you need to enumerate all the possible values you would then compare against the potential subdomains.

Until next time!

Patrik
Follow @0xpatrik

Subdomain Enumeration: Doing it a Bit Smarter

Patrik Hudak — Sun, 29 Sep 2019 16:17:46 GMT

My last post about subdomain enumeration received great feedback. In the meantime, I thought of some other improvements I could make to increase the chances of finding new assets. This post presents a new tool that resulted in several critical reports during the past weeks.

Current state

When I use altdns, I use it solely to generate possibilities. Even though it contains own DNS resolver, it is much wiser to use a faster approach such as massdns.

Firstly, let's explain what altdns does and why it works. Imagine having a list of active domain names that you found for the target:

app1.example.com
customer.example.com
...

Developers usually test the application before going into production. From the experience, they usually use domain prefixes/suffixes such as staging or test to distinguish between testing and production environment. Thus, the sibling domains for the above examples might look like:

app1-staging.example.com
test-customer.example.com
...

Altdns does exactly this: it generates the possible combinations of original domain with the words from the wordlist (example). To make altdns generate possibilities, you simply run:

$ python altdns.py -i input_domains.txt -o ./output/path -w altdns/words.txt

This command generates a huge list of possible domain names for the target. I say possible, because most of them do not exist. To verify the existence of some domain, you need to run a DNS resolution.

When I looked under the hood of altdns, I noticed that it is missing one crucial thing. Targets may use custom words for their domain names. That's why you see many people in the community recommend to "look for patterns" during the recon phase. Unfortunately, this is a pretty vague statement. Also, there is no automated way to do it.

By custom words, I mean words that are not included in your fundamental wordlist. These words are usually unique to the target environment, such as the name of the company, the name of the application and so on. The domain names might look like:

pkjapp-testing.example.com
customers-indiadatacenter.example.com
...

You see, pkjapp and indiadatacenter are not words that you would consider to include in your wordlist.

YOU CANNOT INCLUDE ALL THESE WORDS IN YOUR WORDLIST. It is much wiser to smartly extend your wordlist per target.

Alternations

I created a tool that pretty much replaces the altdns functionality and adds several extra layers. It uses generic wordlist which is automatically extended when needed.

You can find the source code here!

Let's look into the techniques that happen behind the scenes:

(For demo purposes, let's say that wordlist contains just one word: stage)

Insert word on every index — Creates new subdomain levels by inserting the words between existing levels. foo.example.com -> stage.foo.example.com, foo.stage.example.com
Insert num on every index — Creates new subdomain levels by inserting the numbers between existing levels. foo.bar.example.com -> 1.foo.bar.example.com, foo.1.bar.example.com, 01.foo.bar.example.com, ...
Increase/Decrease num found — If the number is found in an existing subdomain, increase/decrease this number without any other alteration. foo01.example.com -> foo02.example.com, foo03.example.com, ...
Prepend word on every index — On every subdomain level, prepend existing content with WORD and WORD-. foo.example.com -> stagefoo.example.com, stage-foo.example.com
Append word on every index — On every subdomain level, append existing content with WORD and WORD-. foo.example.com -> foostage.example.com, foo-stage.example.com
Replace the word with word — If word longer than 3 is found in an existing subdomain, replace it with other words from the wordlist. (If we have more words than one in our wordlist). stage.foo.example.com -> otherword.foo.example.com, anotherword.foo.example.com, ...
Extract custom words — Extend the wordlist based on target's domain naming conventions. Such words are either whole subdomain levels, or - is used for a split on some subdomain level. For instance mapp1-current.datastream.example.com has mapp1, current, datastream words. To prevent the overflow, user-defined word length is used for word extraction. The default value is set to 6. This means that only words strictly longer than 5 characters are included (from the previous example, mapp1 does not satisfy this condition).

Refer to GitHub project's page to learn more about installation and usage.

Until next time!

Patrik
Follow @0xpatrik

Subdomain Enumeration: 2019 Workflow

Patrik Hudak — Thu, 06 Jun 2019 13:24:18 GMT

You are probably shaking your head that this is another post about subdomain enumeration. I have written about it in the past, and so did much other security folks. But things have changed, and I noticed that the results I was getting were not optimal. Don't get me wrong; I inevitably got better results than most people. But there was a room for improvement. After some heavy testing, I had improved my subdomain enumeration game significantly. In this post, I want to share (some of) my thoughts about how to do subdomain enumeration.

Firstly, I am all about efficiency. I want to have the best results as soon as possible. I have glued together the best tools "on the market" to come up with an efficient solution that works. If you remember from the past, I want to have continuous reconnaissance process rather than one-time-shots. These are the tools/sources for it:

Amass
Rapid7 Forward DNS dataset
commonspeak2 wordlist
massdns
altdns (optional)

This is the combination that brings the best results for the majority of the targets I have encountered. Now, let's bring the full process:

Use amass to gather passive data
Retrieve subdomains from Rapid7 FDNS
Generate possibilities from commonspeak2
Run massdns on input from step #1-#3
Run altdns on the result set
Run massdns on input from step #5

Step #5 and #6 are optional and take most of the time. If you don't run it, you will still get great results.

Amass

I've said it hundreds of times; amass is my goto tool for primary subdomain enumeration. Forget Sublist3r and aquatone. Subfinder is a good alternative to amass, but I have two problems with it:

It does not have that many sources as amass
It does things too "nicely". In other words, you require API keys for many services.

Why did I mention to use amass to get passive data? The built-in DNS resolver is just too slow. Amass should provide data from various sources; we have better tools for DNS resolution. (I have spoken about it with the amass author. The amass resolution capabilities might be improved in the future).

To retrieve a passive data using amass, simply run:

amass enum --passive -d

Subdomains from Rapid7 FDNS

Nothing surprising here. FDNS dataset is just a great way to enhance the results that amass brings. You can now use AWS Athena to query the FDNS.

(Source: blog.rapid7.com)

Indeed, you don't need to use AWS to retrieve the subdomains, it is just much more convenient. You can definitely parse it on your own using the guide I have written in the past.

Possibilities from commonspeak2

I have to confess. I never liked traditional brute-force method for subdomain enumeration. However, for some reason, commonspeak2 wordlist just works. To generate the possibilities, you can use this simple Python snippet:

scope = ''
wordlist = open('./commonspeak2.txt').read().split('\n')

for word in wordlist:
    if not word.strip(): 
        continue
    print('{}.{}\n'.format(word.strip(), scope))

Run massdns

At this stage, you have a huge list of potential subdomains for your target. The potential means that they might exist and they might not. To verify that, you need to run an active DNS resolution. For that, massdns is the perfect tool. It is up to you how you approach the automation, I like to do it using Python:

#python3

import json
import subprocess

RESOLVERS_PATH = '/path/to/resolvers.txt'

def _exec_and_readlines(cmd, domains):

    domains_str = bytes('\n'.join(domains), 'ascii')
    proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.DEVNULL, stdin=subprocess.PIPE)
    stdout, stderr = proc.communicate(input=domains_str)

    return [j.decode('utf-8').strip() for j in stdout.splitlines() if j != b'\n']

def get_massdns(domains):
    massdns_cmd = [
        'massdns',
        '-s', '15000',
        '-t', 'A',
        '-o', 'J',
        '-r', RESOLVERS_PATH,
        '--flush'
    ]

    processed = []

    for line in _exec_and_readlines(massdns_cmd, domains):
        if not line:
            continue

        processed.append(json.loads(line.strip()))

    return processed

print(get_massdns(['example.com', 'sub.example.com']))

Using -o J flag, massdns outputs the results in so-called ndjson format. We then need to enumerate through lines and treat each line as valid JSON which we can easily parse.

Altdns

This is an optional tool in my workflow which might bring more fruit to the table (read, more subdomains found). It works by creating permutations for existing domain names. For instance, given the domain name sub.example.com, altdns might provide possible permutations in the form of:

sub-dev.example.com
sub01.example.com
...

You get the point. The idea is to generate such possibilities and then resolve them at once to get the possible hits. It is basically a smart way of doing brute force. There is a very nice research paper written about this topic. The paper, however, introduces much more advanced research in this area. These topics are not yet implemented in altdns, I am currently working on the alternative tool, stay tuned!

To start the altdns for our purposes, we can run:

python altdns.py -i input_domains.txt -o ./output/path -w altdns/words.txt

Why do we run massdns twice? Altdns is a great tool for generating variations of domains names. It, however, returns a huge list of possibilities. Since the DNS resolutions take time, it is better to firstly resolve a smaller list of active domains after step #3. Altdns can then take only subdomains that are proven to be active, which results in a smaller list for resolution in step #6.

You might be wondering why I don't usually provide full scripts that are described in my articles. I think that for those of you who are willing to learn, it is better to try and glue these things together by yourself. Trust me, it is the best thing you can do if you are really serious about this stuff. You can always ask me questions on Twitter.

Until next time!

Patrik
Follow @0xpatrik

Subdomain Takeover: Second Order Bugs

Patrik Hudak — Thu, 20 Dec 2018 11:00:00 GMT

Recently, I was trying to extend my autonomous bug bounty finding engine. My first thought was to incorporate second order bugs into the subdomain takeover scanning process. In this post, I describe how I approached this challenge and how this process can be automated efficiently. There is a great article written by EdOverflow, but I feel there is a bit more to say about this topic.

Second order bugs happen when a website uses a "takeoverable domain" in the wrong place. Imagine a website that uses a Javascript file from the external domain. What happens when the domain is vulnerable to subdomain takeover? When the Javascript file is not available (e.g., the domain expired), the browser fails quietly. If this file is not extremely important to website functionality (e.g., live chat plugin), there is a possibility that website behavior remains unnoticed by administrators. We can register this domain and host our Javascript file, creating a very smooth XSS. It is a classic subdomain takeover example as I wrote about before. The diagram below illustrates this idea.

Why exactly is this called second-order bug? Well, first-order subdomain takeover bugs are just subdomains of the target program that is vulnerable to subdomain takeover. Second-order makes it clear that we are extending the "reach" of our scans to domains which can make a significant impact.

There is at least one open-source tool for this scans available: second-order. The problem is that it does not examine all the fields described in this post. It is, however, an excellent starting point. I just don't feel confident to extend its capabilities since it is written in Go.

Second order bugs are not limited to Javascript files. We can extend this idea to things like CORS. I recommend looking into this HackerOne report to see what I am talking about. The basic idea is this: We need to extract links and domains from places, where subdomain takeover would cause real trouble. What exactly are these places?

script tags — impact: XSS
a tags — impact: social engineering
link tags — impact: clickjacking
CORS response headers — impact: data exfiltration

Implementation

Now, let's look at how to implement this using Python. The high-level process looks like this:

The first part consists of crawling in some shape or form. I will skip the technical details of this; it is up to you to decide what strategy to use. You may:

Request only / for every subdomain found
Do a limited crawl (e.g., BFS tree with height 1)
Do a full crawl

The HTML files resulting from any of these crawls (well, the first option is not a crawl by any means) are fed into the extraction machine.

The extraction process takes several steps to complete. I used single functions to take care of each step. Let's look at it more deeply:

from bs4 import BeautifulSoup

def extract_javascript(domain, source_code):
    '''
    Extract and normalize external javascript files from HTML
    '''

    tree = BeautifulSoup(source_code, 'html.parser')
    scripts = [normalize_url(domain, s.get('src')) for s in tree.find_all('script') if s.get('src')]
    return list(set(scripts))

This piece of code extracts all links from script tags present in the HTML file. I used BeautifulSoup to do the HTML parsing for me. You might notice that there is a mysterious normalize_url function. I will explain it shortly.

def extract_links(domain, source_code):
    '''
    Extract and normalize links in HTML file 
    '''

    tree = BeautifulSoup(source_code, 'html.parser')
    hrefs = [normalize_url(domain, s.get('href')) for s in tree.find_all('a') if s.get('href')]
    return list(set(hrefs))

def extract_styles(domain, source_code):
    '''
    Extract and normalize CSS in HTML file 
    '''

    tree = BeautifulSoup(source_code, 'html.parser')
    hrefs = [normalize_url(domain, s.get('href')) for s in tree.find_all('link') if s.get('href')]
    return list(set(hrefs))

There should not be anything surprising; we are doing extraction analogously to script tags.

import requests

def extract_cors(url):
    r = requests.get(url, timeout=5)
    if not r.headers.get('Access-Control-Allow-Origin'):
        return
    cors = r.headers['Access-Control-Allow-Origin'].split(',')
    if '*' in cors:
        # Use your imagination here
        return []
    return cors

This is something different, but shouldn't be difficult to understand. There are, however different ways to express multiple origin domains. I assumed that ',' is the delimiter. You can do research and make it more robust; there might be different ways to provide multiple origin domains (let me know if you find it).

def normalize_url(domain, src):
	'''
	(Try to) Normalize URL to its absolute form
	'''

	src = src.strip()
	src = src.rstrip('/')

	# Protocol relative URL
	if src.startswith('//'):
		return 'http:{}'.format(src)
	
	# Relative URL with /
	if src.startswith('/'):
		return 'http://{}{}'.format(domain, src)

	# Relative URL with ?
	if src.startswith('?'):
		return 'http://{}/{}'.format(domain, src)

	# Relative URL with ./
	if src.startswith('./'):
		return 'http://{}{}'.format(domain, src[1:])

	# Absolute URL
	if src.startswith('https://') or src.startswith('http://'):
		return src

	# Else let's hope it is relative URL
	return 'http://{}/{}'.format(domain, src)

This function tries to normalize the given URL into its absolute forms. As you may know, arguments to src or href HTML parameter can include relative addresses as well. The normalize_url function returns an URL in its absolute form, so we can easily extract the domain and other parts from it.

We only need to extract the domain name from our absolute URL. This is a pretty straightforward task in Python:

from urllib.parse import urlparse
def extract_domain(url):
    '''Extracts domain name from given URL'''

    return urlparse(url).netloc

The extracted domains are now ready to be forwarded into a subdomain takeover verification engine. I talked about creating one here. This process should be enough to identify higher-order subdomain takeover bugs. You can view the full snippet on GitHub.

These bugs are very rare. I came across only one in the past. However, since the process can be easily automated, there is no reason not to include it in your automation workflow. You might get lucky and get one in the future. Happy hunting.

Check out my other posts about subdomain takeovers:

Until next time!

Patrik
Follow @0xpatrik

Subdomain Takeover: Identifying Providers

Patrik Hudak — Wed, 19 Sep 2018 10:00:00 GMT

Recently, I have come across an interesting list of domain suffixes used by cloud providers which are vulnerable to subdomain takeover. Although the list is pretty accurate, it is still a raw list that needs to be further processed. In this post, I will explain how to process this list correctly and how to identify new vulnerable cloud services using simple methods.

Architecture

Firstly, let's recap and begin by explaining how the usual cloud provider becomes vulnerable to subdomain takeover. When people create cloud resources, they want them to be accessible through their domain. After registering, for instance, a new Heroku app, the engine will give you subdomain in form APP_NAME.herokudns.com. Heroku recognizes that customers want such applications to be exposed through their domain (e.g. herokuapp.example.com) so they provide a way to set custom domain to their cloud resource.

Due to the technical simplicity, such architecture is often achieved through the use of virtual hosting. You can see the basic idea on the following diagram:

The reason why virtual hosting is preferred is that providers like Heroku don't want to dedicate the whole instance to one application. This also means having dedicated IP address (A record) for each application on *.herokudns.com. It is easier for them to hide the content behind a couple of load balancers and let them decide what content to return based on HTTP Host header. You can confirm this by resolving two different *.herokudns.com domains and comparing the returned A records. They will be the same.

After the HTTP request arrives to load balancer, it determines what content to return based on HTTP Host header assignment:

Custom domains provide a little problem in this architecture. We need to provide the name of custom domain beforehand, so the load balancer knows to which domain it corresponds to when the request arrives. This is the whole problem of subdomain takeover: Having CNAME set to cloud provider domain, but not setting this custom domain in your cloud provider portal.

Some cloud services that use virtual hosting setup:

Amazon S3
Heroku
Shopify
Readme.io
...

On the other hand, there are still some cloud providers that don't rely on virtual hosting but simply creates a dedicated instance (and thus IP address) for each cloud resource. We don't rely on Host header, and the target instance thus accepts any HTTP Host header provided. In other terms, this means that we don't need to specify our custom domain beforehand. Any domain with correct CNAME record will work. The prominent cloud provider in this category is Microsoft Azure.

Verification

Based on this explanation, we can divide the cloud providers into two categories:

Virtual hosting oriented (VHO) — services that require to specify custom domain beforehand.
Non-virtual hosting oriented (NVHO) — services that don't require to specify custom domain beforehand.

As we know from the previous post, subdomain takeover monitoring is a continuous process. We should aim for creating the automation that provides a high success rate in finding potential takeovers. My approach is to create so-called signatures that technically explain each vulnerable cloud service, like so:

{
    "name": "Ghost",
    "formats": ["*.ghost.io"],
    "responses": [
        "403 Forbidden",
        "The thing you were looking for is no longer here, or never was"
    ],
    "response_codes": [404, 403, 302, 200]
}

We have four main parts here:

Name — This is the identification of the cloud service
Formats — List that describes various formats of domains that the cloud provider uses. These domains are then used in CNAME records.
Responses — What strings are we looking for in HTTP responses of vulnerable domain.
(Optional) Response Codes — What HTTP response codes are we looking for in vulnerable domain.

These signatures are then used by a scanner that takes a list of domain names for verification and checks their HTTP responses against this signature list (this process was explained in the previous post). Indeed, Ghost is not the only vulnerable cloud provider out there. The next section describes how to identify whether the provider is vulnerable and how to write such signatures correctly.

Virtual hosting oriented

This is the majority of cloud providers out there. A common theme across these providers is the fact that they tend to respond with a predefined error page for HTTP Host header they don't recognize. Imagine this: You request a Heroku application via some *.herokudns.com, but you set the HTTP Host header to a value that is not set in Heroku portal. As I explained before, Heroku has no way of knowing to which application does this domain name relate to, unless explicitly specified in Heroku app settings. So how does Heroku respond to such request?

We can use this to our advantage:

Identify domains that have CNAME to *.herokudns.com.
Make HTTP request to it (with correct Host header)
Check if the known error page strings (e.g., "Build something amazing") are present in HTTP Response body
If yes, there is a potential for subdomain takeover

Now that we know the motivation behind these error messages let's look on how to create a signature. I always like to have fresh FDNS dataset handy during this task. If I want to check whether some provider is vulnerable, this is my usual workflow:

Identify what domains they use. This can be simply checked using Google or creating the cloud resource on your own. Usually query like custom domain will do the job. For Heroku, it is *.herokudns.com and sometimes *.herokuapp.com. Documentation can be found here. The Format part of the signature is now done.
Grep some CNAMEs of such domain from FDNS: zcat fdns.gz | grep cname | grep herokudns.com. You need at least three domains.
Do DNS (A) resolution for these three domains. If the A records match, you have confirmed virtual hosting setup.
Now it is time to identify what the error page looks like. This can be simply done by creating an HTTP request and forging Host header to something non-existing, like so: http GET http://.herokudns.com Host:totallynonexistingdomainhere.com.
From the response, identify some unique strings that are used as error messages. Responses part of the signature is now done.
Now it's time to finish the signature with Name which is trivial.

With that, you are pretty much done. What I also like to do is to register the free account in the cloud provider I am testing and go through the whole workflow of setting the custom domain name. In some cases, the cloud provider is doing domain verification (e.g., Squarespace) so the takeover is not possible whatsoever. In other cases, they require you to have exact CNAME set in your DNS records which are not always possible.

This virtual hosting setup and technique are actually what Domain Fronting is all about.

Non-virtual hosting oriented

This category is pretty simple to check. When following the workflow described above, point #3 is what decides between VHO and NVHO. If you see that the A records of this domain don't match, there is a high chance that the provider is not using virtual hosting. Fortunately, checking these instances is very simple, all you need to do is check, whether the provider generated domain responds with NXDOMAIN or not. There is no need to write full signatures for NVHO providers; they should be check separately.

Microsoft Azure is the best example of NVHO provider. They use several domains for their services:

"*.cloudapp.net"
"*.cloudapp.azure.com"
"*.azurewebsites.net"
"*.blob.core.windows.net"
"*.cloudapp.azure.com"
"*.azure-api.net"
"*.azurehdinsight.net"
...

Now, when you see CNAME pointing to one of these domains, you need to check whether the CNAME domain is NXDOMAIN or not. That's it.

For VHO and NVHO, these are not the strict rules, and there are indeed exceptions. I provided the guidelines which work for the majority of cloud providers that I tested in the previous months.

If you do your research in subdomain takeovers, consider contributing to an open-source project called can-i-take-over-xyz. I will be adding my research there.

Check out my other posts about subdomain takeovers:

Until next time!

Patrik
Follow @0xpatrik

Subdomain Takeover: Going beyond CNAME

Patrik Hudak — Tue, 28 Aug 2018 11:29:19 GMT

After writing the last post, I started thinking that I pretty much covered all aspects of subdomain takeover. Recently, I realized that there are no in-depth posts about other than CNAME subdomain takeover. I briefly mentioned NS subdomain takeover in my other posts. The problem is that there are not many known cases of successful subdomain takeover using NS records. I carefully used this wording, because NS subdomain takeover is indeed possible! This post covers all technical aspects so that you can extend takeover scans with other signatures. Note however that NS takeover is a little bit difficult to understand than normal CNAME takeover.

Let's start with explaining how NS takeover differs from traditional CNAME subdomain takeover. I assume you are well-informed about CNAME subdomain takeover (if not, continue here). In NS subdomain takeover, we want to be able to control the whole DNS zone on (authoritative) DNS server. As you might know, DNS has a hierarchical structure, and each level can be served by different nameservers:

To make things more clear, let's explain the whole scenario with a simple example. I own a domain called wolframe.eu which I used for testing purposes. These are the (authoritative) nameservers for this domain:

As you can see, the DNS zone is managed by AWS (more specifically AWS Route53). The result we got in dig is non-authoritative answer. Non-authoritative means that it was not returned by authoritative DNS servers (one of four AWS in this example), but rather by some middleman. This middleman is 8.8.8.8 recursive Google DNS server as indicated by @8.8.8.8 in dig command. You can get authoritative answer by changing @8.8.8.8 to one of four AWS server such as @ns-1276.awsdns-31.org. This distinction will be crucial in later sections.

This is my DNS zone inside the Route53 portal:

There is only one A record for wolframe.eu itself set to 1.1.1.1.

Once DNS request is an issue for any of *.wolframe.eu, one of these four AWS nameservers is randomly selected, and DNS result is returned:

Now imagine this. I delete my DNS zone in Route53, but I keep NS records pointing to AWS. Now I get:

Oops, not good. Now, if an attacker simply creates the new DNS zone in Route53, he can return any DNS response. This is a basic premise of NS subdomain takeover. There are multiple caveats with the scenario I just explained. We cover these things in the next sections.

Custom nameserver domains

DNS specification does no require nameservers to be on the same domain. Imagine that example.com is served by two nameservers:

ns.gooddomain.com
ns.baddomain.com

If the base domain of canonical domain name of at least one NS record is available for registration, the hosted domain name is vulnerable to NS subdomain takeover.

In other words, if baddomain.com does not exist, an attacker can register that and have full control over example.com (and even its subdomains). The real question is, does this ever happen? I highly recommend reading post by Matthew Bryant where he explained how he got a subdomain takeover using precisely this approach. I also highly recommend his project called TrustTrees. It is used to generate delegation graphs for any domain. He was able to discover more similar problems using this tool.

From the automation perspective, the process is rather straightforward:

Resolve all nameservers for the scanned domain
Check if any of the nameservers are available for registration
Alert based on the result from the previous step

Checking the only status in DNS reply is not sufficient indicator. The problem is that multiple servers are authoritative for a given domain. If one nameserver is not working, you get NOERROR because the other server is working just fine. DNS resolution does this quietly in the background. That's why tools like TrustTrees come handy in these type of analysis.

Managed DNS

For DNS zones which are hosted on third party providers like AWS, the story is a little bit different. The nameserver domains won't usually be available for registration. As I shown in the example above, there might be a case where NS records point to a hosted provider, but the DNS zone is not claimed in the portal. In 2016, Matthew Bryant managed to take over 120k domains which had NS records set to one of the DNS providers, but not claimed inside it. I highly recommend reading that post.

From the automation perspective, the process is a little bit different. It also varies from provider to provider, because all of them can have different signatures for non-existing zones. Let's take for instance DigitalOcean. My site (0xpatrik) is not hosted on DigitalOcean. Let's try to query its nameservers for it.

As we can see, DigitalOcean nameserver returns REFUSED DNS status. If the nameservers for 0xpatrik.com would be set to ns1.digitalocean.com, subdomain takeover would be possible.

As described in Matthew's post, AWS generates a new set of nameservers for each DNS zone, so for successful PoC, you will need to brute force these nameservers (but it is possible).

I don't write signatures for each DNS provider. The cases for NS subdomain takeover are so rare (in scopes I am interested in) that I don't care about the few false positives each month. Let's now look at the automation process for hosted DNS:

For domain, extract its nameservers
Iteratively resolve domain against each of these servers separately
Look for responses like SERVFAIL or REFUSED and alert based on it

You can easily achieve this using dnspython and custom resolvers like so:

import dns
import dns.rdatatype

for NAMESERVER_IP in NAMESERVER_IPS:
    custom_resolver = dns.resolver.Resolver()
    custom_resolver.nameservers = [NAMESERVER_IP]
    q = custom_resolver.query(DOMAIN, dns.rdatatype.A)

Overall, your automation should be extended with the following process:

Check out my other posts about subdomain takeovers:

Until next time!

Patrik
Follow @0xpatrik

Subdomain Takeover: Finding Candidates

Patrik Hudak — Tue, 21 Aug 2018 10:00:00 GMT

After my last posts, many of you have asked me how I found the domains for Starbucks. Although I am not willing to give you exact resources, in this post, I want to describe at least the approach I overtake.

When I received the bounties for subdomain takeover, I was happy and excited, but also felt sorry in some way to other bounty hunters. Since I got automation in place, it usually takes just a few minutes to create PoC and write a report. I received $2,000 for that. Other bounty hunters spend countless hours trying to create a working exploit and receive the only fraction of that bounty. I still don't consider myself as a bug bounty hunter, but rather low-hanging fruit hunter.

The very first thing you need to understand is, that subdomain takeover monitoring is a continuous process. People are often surprised by it. Domains are often working perfectly, but once the administrator removes the resource where CNAME is pointing, the vulnerability is introduced. In my second report, the domain was fully working for several months. It became "vulnerable" at some point in time. 20 minutes after, I already got working PoC. If you want to find subdomain takeovers consistently, you need to create automation for yourself. No exceptions.

Process

As I said above, my code for automation won't be released to the public (yet). However, the process I am going to describe should be more than enough to replicate my automation and come close enough to its results. The process is divided into four main parts:

Scope parsing
Subdomain enumeration
Subdomain takeover verification
Notifications

Scope Parsing

You need to have a list of scopes for which you want to monitor subdomain takeovers. Bug bounty program specification/rules are your friend here. If we take Twitter for instance, they have clearly defined scope:

On the other case, for Intel they excluded all report for intel.com:

There is also a grey zone. Some programs like Starbucks, don't have explicitly listed domain wildcards, but rather they specified it differently:

It is up to you whether you want to create and automation for that. Keep also in mind that scopes also tend to change as acquisitions often happen and new programs are introduced. You can take inspiration from EdOverflow's script. The output of this step should be the list of domains on which you want to perform subdomain enumeration, like this:

twitter.com
vine.co
twimg.com
pscp.tv
periscope.tv
starbucks.com
starbucks.ca
...

Subdomain Enumeration

I have written several posts about subdomain enumeration. The only thing you need to do is to enumerate every domain from the previous step. Which tools and resources to use is the real secret sauce, however, and highly recommend starting with amass.

amass -nodns -norecursive -noalts -d twitter.com

As you might know, many tools also provide domains which are no longer existing (e.g., NXDOMAIN, SERVFAIL, ...). Personally, I recommend NOT to filter domains and also keep the ones responding with another DNS status then NOERROR. I will explain why in the next section. The output of these sections should be a list of all subdomains of all domains from the previous step.

Subdomain Takeover Verification

Once having the full domain list, you need to monitor for subdomain takeover continuously. The easiest way is to setup signatures or rules for true positives. As I explained in this post, every cloud provider has its error message. For CloudFront, it looks like this:

There is no need to inspect that visually. You can automate that by creating HTTP requests (with correct Host header!) to domains and then finding the unique string in HTTP response. Find some inspiration here. You should also take into consideration that some cloud providers require just DNS status verification (such as Azure). There is no need to create HTTP requests there.

The reason why I said that you should NOT filter domains in the previous step is, that subdomain takeover verification does not end with CNAME subdomain takeover. As I briefly mentioned in subdomain takeover basics post, there are also NS and MX subdomain takeovers. Long story short, when domain responds with NXDOMAIN, it might be a case of NS subdomain takeover. Many people forget about this possibility since there are not many cases of NS or MX subdomain takeover. That being said, there is at least one documented. If you want to take subdomain takeover seriously, I highly recommend reading that post at least twice.

The diagram at the beginning of this post showed two independent loops. I recommend you similarly design your automation. The loop of subdomain enumeration is usually much slower than a loop of takeover verification. You want continuously run each loop and feed results from step two to step three without any intervention.

Notifications

The last step, but still as important as others are notification. Once the subdomain takeover is discovered, you have to act on it as fast as possible. The field is becoming more competitive, and there might be a chance that the same domain was discovered by somebody else. The rule the subdomain takeover says: Who creates a PoC first, wins. You want to ideally set up push notifications, even to your phone. There is also a possibility for creating automation that does PoC itself.

Check out my other posts about subdomain takeovers:

Until next time!

Patrik
Follow @0xpatrik

Subdomain Takeover: Yet another Starbucks case

Patrik Hudak — Wed, 01 Aug 2018 10:00:00 GMT

Recently, I was repeatedly awarded $2,000 bounty for subdomain takeover on Starbucks. You may remember my post about bug bounty report where I described how to subdomain takeover was possible using Azure. This case was pretty similar. However, I had to use another Azure service called Traffic Manager. In this post, I explain the step-by-step process for the proof of concept.

On Monday evening, I noticed that wfmnarptpc.starbucks.com responds with NXDOMAIN. The more interesting fact was that it has CNAME set to s00149tmppcrpt.trafficmanager.net. From experience, I knew that this has a perfect chance of being a subdomain takeover. As you may remember, Azure mostly uses dedicated IP addresses, so when CNAME to Azure responds with NXDOMAIN, your bug bounty radar should be on.

Previously, I haven't mentioned that trafficmanager.net is also one of the domains where subdomain takeover is possible. Let's look at what is Traffic Manager about:

"Microsoft Azure Traffic Manager allows you to control the distribution of user traffic for service endpoints in different datacenters. [...] You can also use Traffic Manager with external, non-Azure endpoints."

$ dig a wfmnarptpc.starbucks.com

; <<>> DiG 9.10.6 <<>> a wfmnarptpc.starbucks.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 20251
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;wfmnarptpc.starbucks.com.	IN	A

;; ANSWER SECTION:
wfmnarptpc.starbucks.com. 33165	IN	CNAME	s00149tmppcrpt.trafficmanager.net.

Simply put, there is some domain which has link to non-existing subdomain of trafficmanager.net. To prove our point, we need to registered the previously removed asset in Azure. Thankfully (for us), Azure is not doing any domain ownership verification :-)

You may remember that this situation is still not the winning point since there might be disabled configuration with this subdomain in Azure. In this case, even though externally it seems that takeover is possible, PoC creation would fail.

I started by creating a new Traffic Manager profile in the Azure portal:

Nice! At this point, I knew that the subdomain takeover is possible. The s00149tmppcrpt.trafficmanager.net is available; I can take it and progress with PoC. Now I needed to point the domain to one of the servers that I own:

The only thing left was to create a new virtual host on my endpoint:

Time well spent. Thank you very much.

Until next time!

Patrik
Follow @0xpatrik

OSINT Primer: Organizations (Part 3)

Patrik Hudak — Thu, 26 Jul 2018 22:27:33 GMT

In the previous posts, I cover a lot of things around OSINT. There are, however, still some techniques and ideas which were kept untold. I kept them for this post because I feel they are mostly related to organizations. That being said, don't let this stop you from using them elsewhere. This post demonstrates them in the context of organizations. As always, I present you with some limited framework:

I am interviewing for/joining/doing business with the organization; I want to find some information around them.
I am doing security assessment/bug bounty on the organization; I want to find some technical details.

Employee Reviews

An organization is as good as its employees. With that being said, employees also like to write an anonymous review of their company. This is particularly useful when you are considering joining the company or looking to find the salary to ask. The most popular site for that is Glassdoor Company Reviews. Note that you will be required to log in before viewing all reviews. A similar site for reviews is Indeed.

(Review of Apple Inc. on Glassdoor.com)

Business type information about a company tends to be specific to its country of registration, so I decided not to focus on that in this post heavily. There are aggregators such as opencorporates.com. I recommend checking OSINT framework as its Business Records for specific search provider.

Technical Stack

From the eyes of pentester, knowing the technology stack of the organization is a valuable thing to have. You want to maximize your efforts, so knowing for instance which antivirus or outbound proxy is the company using might help you with structuring your attack. I like to do multiple things to find out these things.

Look for job postings. They usually include the required skills or experience for the position. Look for technical positions. You can see the job postings in several ways. One good idea is LinkedIn Jobs. There are also job posting sites like Indeed or Monster Job. I recommend using Google dork to find all possible sites: "" intext:career | intext:jobs | intext:job posting. Often, company is listing job postings on its website. The idea behind this technique is simple: Organizations tend to stay consistent and deploy the same products company-wide.
Similar to the previous technique, look for (technical) employees of the organization (check the previous post) on LinkedIn. They will most likely have there certifications and skills up to date. Beware that the certifications could be acquired in the previous gigs, so I usually use this information to cross-validate the other methods.
Check stackshare.io. Some (mainly technical) companies share its stack publicly.
Use search engines. You should not limit yourself to job postings. There might be questions on StackOverflow or other similar sites from employees about specific products. These step will require more in-depth OSINT.
Metadata. Organizations often share documents publicly on their website. You can leverage the fact, that popular business products such as Microsoft Office or Adobe Reader by default append metadata to the files. This metadata contains things like author's name, date, and most importantly software type and version. You can target the old version of some software with client-side exploit. The best part is that since the metadata often contain author's name, you know who is your primary target. If you are interested in this topic, I highly recommend this post written by Martin Carnogursky.
Fingerprint on your own. This part is a little tricky. Mainly, you want to find out what is running inside the network without being in the network (note that external scans will most likely not tell you what web proxy is being deployed). There is, however, one method for that, although I would still label it as experimental. The method is called DNS Cache Snooping.

The basic idea is this: You will check the organization's DNS cache to see, whether there were some previous requests to some specific domain. Why is it useful? Imagine the antivirus. It downloads the new signatures periodically. For instance, some updates from McAfee some from domain download.nai.com. You can use non-recursive DNS request to organization's DNS server to check whether this domain is in its cache or not. However, external and internal DNS servers must be shared (or at least their cache) and still, having a cache hit might be only a false positive. That's why I called it experimental. Let's look at a diagram that might make things more easy for you (or not):

You can execute the non-recursive DNS query using this dig command:

dig @DNS_SERVER -t A DOMAIN_TO_CHECK +norecurse

By external DNS server, I primarily mean DNS servers serving organization's website(s), in other words:

dig -t NS MAIN_DOMAIN

Another problem is that you need to know the domains of products (a.k.a snooping signatures). There is a project such as DNSSnoopDogg. However, they are not updated for some time.

Public Secrets

The last of this post and probably my most favorite are public secrets. It is incredible, what organizations share publicly without them realizing it. This post in fact already covered two categories: Metadata and Exposed services.

There are, however, other types of public secrets. Firstly, there are secrets committed to git repositories. This usually happens by accident when developers work with code which has API keys or passwords hardcoded in the source code. When such code is committed to a git repository, it stays in its history even if the secret is later deleted (not purged). There are two projects which I use for scanning git repositories: gitleaks and truffleHog.

Similarly, paste sites are a gold mine for secret data. Developers tend to share code using these sites, and they don't tend to look on security aspect - often submitting code with included secrets. If you want to dig deep into this, I recommend using PasteHunter. It is a project which periodically checks popular paste sites and runs YARA signatures to check, whether it contains interesting strings. Alternatively, you can use ad-hoc methods such as using Google dorks: site:pastebin.com ORG_DOMAIN.

Lastly, I want to mention public S3 buckets. Recently, there were multiple cases where sensitive information was hosted in public S3 buckets. An S3 bucket can be configured as public which developers often opt to because it is easier to work with. The problem is that once the bucket name is discovered all its content can be seen by anybody without authentication. From the tools, I recommend bucket-stream which is a CLI tool. I also highly recommend a new tool which can be seen as Shodan for S3 - buckets.grayhatwarfare.com.

Part 4 of OSINT Primer will deal with Certificates. Stay tuned on Twitter to get it first.

Parts in this series:

OSINT Primer: Domains
OSINT Primer: People
OSINT Primer: Organizations

Until next time!

Patrik
Follow @0xpatrik

OSINT Primer: People (Part 2)

Patrik Hudak — Wed, 18 Jul 2018 19:09:21 GMT

In this post, continuing with OSINT related topics, we will look at researching people. Similarly to domains, there are some specific goals during our "person analysis":

The person is our new acquaintance. We want to find some information about him/her.
You want to hire a new employee. Besides standard background check by HR, you might want to perform your OSINT to see, whether he/she will be a good candidate.
You want to pitch a new business product to a high-profile individual in some particular company. You need to get his/her e-mail or mobile number first. Note that this step usually includes some organizational research which will be cover in next parts of this series.
You are a penetration tester and are currently doing spear phishing assessment. You need to find information to increase the potential success of the phishing assessment.

As you can see, there are multiple situations where OSINT about people might come in handy. Let's look at some specific techniques.

Note: The techniques in this guide should NOT be used for malicious purposes. Although it is hard to write a guide without the methods that can be used in malicious scenarios, I am not responsible for such actions.

OPSEC

Before we begin, I want to mention something important. You may have heard about operational security or OPSEC. During the searches, you might expose yourself in many different ways. For example, if you are logged in to LinkedIn and visit some other profile, the person will be notified about that (including your name). OPSEC can be thus seen as guidelines in a specific context. These guidelines prevent the other person from knowing either that the OSINT is being performed on them or knowing YOUR true identity.

I recommend using Tor Browser Bundle for all OSINT related activities. Firstly, your IP identity is hidden because of several encrypted connections a.k.a onion. Secondly, the customized version of Firefox assures that between the restarts, the cookies are deleted, and so no past activities can be fingerprinted. You should never login to your accounts using Tor.

If the speed of Tor is problematic to you, you can opt to use VPN service combined with some safe browsing environment. Again, you don't want to expose the cookies in any way. The simplest option is to use Private Mode in a browser, although I would go as far and recommend using virtual machine dedicated to OSINT.

Social Media

First thing I like to do is to find the social media profiles. Why? I believe that they hold the majority of useful OSINT information. You should expect multiple false positives when dealing with popular names such as John Smith, etc.

Finding social media profiles might be difficult or easy depending on privacy and consistency of your target. Simple Google search will sometimes do its trick:

John Doe site:facebook.com
John Doe site:instagram.com
John Doe site:linkedin.com
...

Note that for several social media sites, you will need to have an account created to see the full profile. I recommend creating a fake account for this purpose.

People tend to reuse their usernames across different services. Username there acts as a unique identifier of a person across the Internet. I like to extract the username from the Instagram of the person. Then, I use a service called namechk to search for this username across many different social media platforms.

I have to say that I haven't registered an account on all of the indicated platforms. You should expect some false positives, as always.

There are also aggregator such as pipl.com or social-searcher. Although I don't always use them, they tend to provide a more high-level view of the person.

Prospects

This is a topic that can be suited both here and to organizational OSINT as well. I decided to stick it here since I think it mostly relates to people than organizations itself.

When dealing with sales, it is often necessary to have a good contact in the organization to pitch to. You should be smart about that and decide, which person is the right choice. Certainly, it is not a good idea to pitch an infosec related product to CEO of Fortune 500 company. CISO or Senior Security Engineer might be a better choice here. On the other hand, pitching a security-related product to CEO of startup makes more sense.

I don't want to describe the full cold calling/e-mailing approach here; I want to explain the best ways to find the contact information of key people in your target organization. I highly recommend checking Clearbit Prospector for this task. It is a great product with accurate data. Although it is a paid product, I honestly think it is worth the money if you are doing lot of such searches. Yes, it is not OSINT in a true sense, other techniques are explained below.

Another product that I like is Hunter.io. Again it is a paid product, but you have a 100 searches / month for free. Similar services are called voilanorbert and headreach. They also operate on a freemium model. I think that this is the best way of finding e-mail addresses for specific people in organizations.

Alternatively, you can use LinkedIn for the initial cold message. You simply use Search bar with query and get the most accurate results. People tend to keep their LinkedIn profiles updated. You can then connect with them easily (although not via e-mail). A similar thing applies to Twitter. However, I would say it is a less formal medium.

Pro tip: For smaller companies, the e-mails are usually listed directly on their website.

For some useful Google Dorks about cold calling, I recommend this article.

Telephone Numbers

Finding telephone numbers is more tricky than e-mails. My go-to tool is either Google where I try to dork using the name of the person and some combination of telephone number keywords or use White/Yellow Pages in the particular country such as whitepages.com for the US. Make sure to check awesome-osint for a list of telephone number searching services.

Sometimes, I need to perform a reverse telephone search: Given the telephone number, I want to retrieve the name of its owner. This is useful when you have missed calls which you want to correlate to the owner. You could either use services described above, but the more universal approach is as follows:

Try the Facebook search with the number. The owner should come up in the results if he/she has the number associated with the profile.
Save the number in your phone and look at Viber or WhatsApp contact list. These services allow specifying photo and the name of the owner and this information can be extracted just by knowing the telephone number.

E-mails

I described finding e-mails in the Prospecting section above. Now, I want to extend that and talk about more technical things about e-mail OSINT. SMTP support two, not very well-known commands VRFY and EXPN. The former is used to verify directly on the mail server, whether some particular mailbox exists or not. This is particularly useful when you are already familiar with standard e-mail address format for the company. The formats are usually:

lastname@company.tld
firstname.lastname@company.tld
firstletterfirstname.lastname@company.tld

The easiest way to test this is to use an online tool such as MailTester.com. Don't be distracted by a historic design of this websites. It does the outstanding job compared to other similar services. Note that not every SMTP server allows this command (apparently), but from experience, even the high-profile organizations sometimes have this turned on (ehm, Apple).

The EXPN command is used to list members of some distribution list. SMTP server provides the individual e-mail addresses of such members.

I also like to test, whether some e-mail address is present in some dump of leaked credentials. The easiest way is to use Have I Been Pwned by Troy Hunt. This is useful in combination with security assessment. Why? Usually, people reuse passwords, and there is a chance that the password that user used in some services from which the credentials were stolen is used in a corporate environment as well.

The purpose of this guide is not to flood you with as many tools as possible. Rather, I try to explain different techniques. If you want to use different tools, you can find the right one here. Note that some of the tools work only in specific countries.

Parts in this series:

OSINT Primer: Domains
OSINT Primer: People
OSINT Primer: Organizations

Until next time!

Patrik
Follow @0xpatrik

Finding Phishing: Tools and Techniques

Patrik Hudak — Mon, 09 Jul 2018 20:41:13 GMT

Phishing is still one of the most prominent ways of how cyber adversaries monetize their actions. Generally, phishing tries to accomplish two primary goals:

Gain initial access to network — Adversary sends spear phishing e-mail with a well-crafted pretext and malicious attachment. Adversary then waits until the victim opens the attachment and connects to the C2 server. The attachment is usually one of Office file formats in combination with VBScript/WScript/Powershell and has pretty high success in evading anti-virus (when done correctly).
Steal credentials for online services — Adversary clones some high-value website (usually login form of some web services) and convince the user to enter her credentials. Adversary then gathers the credentials and uses it to further (malicious) actions. The cloned website is delivered to the victim using different channels. Delivery of such website (with pretext) to the user depends on the adversary "audience". Generally, e-mail is a prominent delivery channel for this type of phishing as well. The attachment is not present this time. However, the pretext is crafted in a way that it contains URL to the malicious domain. Other channels include Whatsapp, Facebook Messenger, etc.

When dealing with incident response, the general workflow in analyzing phishing is in most cases reactive. In other words, incident responders wait until some tech-savvy employee/customer reports the suspicious phishing site, suspicious e-mail, or start seeing suspicious traffic in SIEM. At that time, it is likely that other employees have already been compromised.

In this post, I want to focus on the latter category and describe, how you can proactively find domains that try to mimic the websites of some particular organization (read yours). I will focus on:

Websites mimicking intranet sites to get valid internal network credentials.
Websites mimicking product sites to get valid credentials from your customers. These can the result of brand damage since the phishing site is associated with your brand.

By proactively detecting these sites, you can be prepared to face phishing waves inside your organization and notify your customers about malicious actions hopefully before they happen.

Please note that if you want to get some results from these techniques, you should implement continuous processes regarding automation instead then one-time search. By no means you should expect perfect process for this problem - instead, we are trying to use heuristics to find a significant amount of badness without least amount of false positives.

I will dedicate the whole post for explaining details of attachment-based spear phishing, including macro creation and some Powershell tricks.

Domains

Firstly, let's look at the general scheme of finding such domains/websites. Since there are countless websites created every day, it would be pretty hard to check them all. We need to limit the scope, somehow. Usually, an adversary creates a new domain that at least look somehow legitimate:

Typosquatting domain — Typosquatting is a technique of registering domain names which look similar to some legitimate domain name. For instance, given google.com, one example of typosquatting domain might be g00gle.com
(notice the "zero" instead of "o"). Such domain name appears identical to the original one. There is a large list of typosquatting techniques.
Doppelganger domain — A doppelganger domain is similar to typosquatting domain. It is a domain which is missing "." (dot) in a domain name. For example, an instance of Doppelganger domain for mail.google.com is mailgoogle.com (notice the missing dot). When the content on these domain matches branding and content of the original website, users are not able to tell the difference and are more likely to be tricked by an attacker (e.g., for credential harvesting or financial fraud).
Domain with keyword present — The new trend in phishing domains is, that adversaries create long domain names with gibberish, but include the keyword of targeted brand inside the domain name. Because of keyword and (almost always) SSL certificate present, users believe that this is indeed a legitimate site. This should by no means be considered as a rule. There are certainly domain names that clone some brand but doesn't include any such keyword in FQDN.

From my own experience, the last category is the most prominent. One of the reasons is that many typosquatted domains for popular brands are already registered. An adversary needs to be more creative and look for typosquatting domains which are easily spotted as somehow suspicious by a majority of users.

On the other hand, a long domain name with brand name present as a keyword is "less" suspicious that somehow weirdly typo-squatted domain. Keep in mind that these are my opinions about this problem.

Finding typosquatting

There is probably just one tool you need for detecting typosquatting: dnstwist. It is a Python package that will enumerate all typosquatting possibilities and present you with a nice report. The list of domains can be then fed to EyeWitness to see, whether some domain is hosting something potentially malicious.

Excerpt from dnstwist running on paypal.com:

Alternatively, there are online services like dnstwist.it which do the similar thing directly in your browser.

Finding Long Domains with the Keyword

Before going forward, I recommend reading Censys Guide that I wrote before.

The technique in this section is possible because of two things that emerged in recent years:

Free CA's such as Let's Encrypt
Certificate Transparency

The basic premise is this: Since the TLS/SSL certificate is now free to obtain, adversaries in most cases issue one for the malicious domain. "Secure" badge in URL bar makes things more legitimate.

(Picture taken shamelessly from https://github.com/x0rz/phishing_catcher)

How can we leverage this fact?

Certificate Transparency (CT) is a project that collects the majority of certificates that were issued and provided this data to the public. By parsing CT logs, we can easily extract domain names (from Subject / SAN fields) and make a simple syntactic comparison of keywords inside these domain names.

Firstly, we need to get access to Certificate Transparency logs. There are numerous services offering this, including certdb, and crt.sh. I don't use either of these two because they don't offer advanced query options. I recommend you start using Censys for querying CT logs.

Secondly, we need to gather a list of potential keywords we want to search. The most common keyword would be the full domain name of some company/service (e.g., apple.com). You shouldn't stop here but look for variations like striping TLD (apple) or querying domain names. Beware that depending on the keyword; the result set might provide a couple of false positives which you would need to sort manually at first. I also like to filter based on particular CA (mainly Let's Encrypt) but only in cases where the result set is too big. The example query to find phishing domain for apple.com would then be:

(apple.com*) AND parsed.issuer.organization.raw:"Let's Encrypt"

you can also limit the results by specifying issue date range like so:

parsed.validity.start: [2018-07-09 TO *]

An alternative to using Censys as the primary source of CT data, you can leverage a fantastic project called CertStream. CertStream provides a real-time stream of newly generated certificates which you can use to detect specified keywords in (near) real-time. In fact, there is a project called phishing_catcher that does just like that. The major downside of this approach from my perspective is a lot of unnecessary work when you are interested in only one brand. You need to run CertStream regularly whereas Censys can be queried anytime with more advanced query options.

Screenshot from phishing_catcher:

There is also an option of creating your own certificates database using projects like Axeman.

Lastly, I want to point out why certificates are ideal for finding such domains. Long domains usually contain multiple levels of subdomains. One of the alternative ideas is to gather a list of newly registered domains for some TLDs (Whoxy provides such service) and check the keywords in these domains. The problem is that these domains are second-level domains - you need to perform subdomain enumeration first to discover all potentially malicious domains. Another problem is, that since such domain names contain a lot of gibberish, subdomain enumeration techniques are likely to fail (unless using CT sources).

Putting It Together

As always, I would like to automate this boring process. It is sometimes useful to combine both techniques:

Run dnstwist on some domain
Resolve the typosquatting domains
Strip TLD and find them in Censys

High-level, this is the process that I follow (dotted line represents alternative approach):

Until next time!

Patrik
Follow @0xpatrik

OSINT Primer: Domains (Part 1)

Patrik Hudak — Sun, 08 Jul 2018 16:06:56 GMT

In this post, I explain how I approach finding as much information about the domain as possible. The post doesn't solve the enumeration part for finding related domains (as I explained here), but rather finding domain-specific data such as owner, reputation, or DNS settings. The post is aimed towards everybody working on threat intel, malware analysis, bug bounty, journalism, and many similar areas.

Note: This is the first version (July '18) of this OSINT Primer. I will be progressively updating it with new tools and techniques.

Before I dig into specific techniques/tools, I want to talk about my mindset briefly. Usually, I have one of these goals in mind:

The domain is primary domain of my target, I want to get as much information as I can. Note that in this case I usually go with gather-everything-use-something.
The domain is likely malicious, I want to confirm my hypothesis and see what it is about.
The domain seems like a potential attack vector to gather initial foothold. Specifically, it is hosting some services which can be exploited. I want to see details about these services.

Remember that you should always have a clear goal in mind. This prevents you from doing unnecessary stuff.

Note that there is a slight overlap with domains and services present on domains. In some cases, I will explain techniques which affect services as well.

WHOIS

The very first technique that should be in your arsenal is WHOIS lookup. WHOIS is used for querying databases that store the registered users of domain names, IP blocks, or ASN. You can use the CLI tool:

$ whois DOMAIN

or opt to some web service such as ICANN WHOIS.

WHOIS data provide information about an entity that registered the domain. Remember that some domains might have WHOIS information hidden, and some might provide false data.

WHOIS data provide a clue whether the domain is tied to some specific organization or not. Although this is more useful in enumeration step, WHOIS data can provide help in some specific situations - for instance, if you encounter domain which tries to mimic some specific organization and WHOIS records are not tied to that organizations, this is a huge red flag.

Domain Profiling

Sometimes, you want to get an overall picture of domain information or decisions that domain administrator made.

One of my go-to tools is Robtex's DNS lookup. It provides a massive amount of information about the domain. I especially like the Shared sections which provide you with an overview about other related domains (yes, this is related to the enumeration phase, forgive me for once :-))

Robtex provides much more information (e.g., SEO details, reputation, ...), however usually in a limited scope. I try to use another source for specific details. Robtex provides me with a high-level perspective. I highly recommend creating (free) account there so you can leverage more advanced functionality.

Next, I like to use domain_analyzer to provide me with in-depth technical information about domain settings. This tool is a literal beast. It can even crawl the websites to discover e-mails and much more. I like to use it in a more limited way like so:

python domain_analyzer.py -d DOMAIN -w -j -n -a

It is hard to tell directly what data can be useful from this output. However, it helped me multiple times in the past. I like to store the output and get back to it during the analysis multiple times.

Passive Data

It is useful to check what was the domain serving in the past. There are two types of passive domain data:

Passive DNS — What were the values of DNS records in the past
Passive "content" — What was a web server on this particular domain hosting in the past

For Passive DNS, I like to use RiskIQ Community Edition. The interface is super simple, and search results will show you the passive data immediately:

Although RiskIQ CE is meant to be an overall analysis platform for domains, I exclusively use it to get passive DNS data. As in many other areas of this post, it is up to you to decide, whether you stick with one source or use multiple sources for different data. I like to use the latter approach since each provider is usually reliable in just one area.

Next to RiskIQ CE, I like to use VirusTotal as well:

Again, you will get more data than just passive DNS. From my own experience, RiskIQ tends to provide more data for passive DNS.

Lastly, I will mention passive DNS from CIRCL.LU which I am lucky to have access to. I sometimes use it to cross-correlate the above two sources. Note that CIRCL.LU passive DNS is not open to the public.

More Passive DNS sources:

My go-to tool for passive content is Wayback Machine. It includes snapshots of most websites from the past. There are usually multiple snapshots, so you can even choose the date of the snapshot you want to see:

The frequency of snapshots depends on website popularity. Next, I like to use simple Google Dork to retrieve the last content of some URL from Google database like so:

cache:https://eff.org/

Content Analysis

You might want to check what is the web server on the domain currently serving. When dealing with potential malware site, it is necessary to follow basic OPSEC guidelines. You should NEVER directly visit such site without at least some protection regarding VPN or even a virtual machine. I like to use the service called urlscan.io which makes HTTP request on your behalf, provide you with a screenshot and some other information that can be used for your analysis.

Sometimes, you want to detect the visual changes on some websites. This is useful when the domain is currently parked and might be changed to something different in the future. For this purpose, I like to use visualping.io. This service will automatically notify you once the content on some domain change. An open-source alternative to this is called urlwatch.

From the content perspective, short URLs are often used to mask the malware/phishing domain in delivery to the victim. The tool called checkshorturl.com is used to expand the short URL to its original form automatically.

Related to content, I usually want to check what is the technology stack on some website. I use Wappalyzer as a browser plugin. Wappalyzer automatically recognizes technologies on each website you browse to:

Results from Wappalyzer enable me to then fire some vulnerability scanning tools such as droopescan in this case. If you want rather use a CLI based tool, I recommend stacks-cli.

Traffic Analytics

After content analysis, I want to check how to website (on the domain) is popular on the Web. I use several SEO analytics tools for this:

Reputation

During incident response or malware analysis, it is often necessary to check the reputation of some domain. The reputation might give you a clue whether the domain is known to be associated with some malicious activity. There are numerous (free) services for providing this information. You should always check multiple sources as the strategies for categorizing domains differ from vendor to vendor. Reputation often goes hand-in-hand with categorization. Domain categorization is determined by the content the (web) server is hosting. The categories can be then used for many purposes such as web traffic filtering using proxies. Reputation is then determined based on the category - low trusting domains will have category ads, suspicious, malicious and so on.

Reputation tools that I use the most:

There are also domain blacklist which are a list of domains that are explicitly categorized as malicious. Tools such as CyMon also look inside these blacklists. Example of this blacklist is Spamhause Domain Blacklist.

OSINT Automation

As you might see, there are many sources to domain-related data. Manually querying each of these sources can be exhausting at times of extensive analysis when you need to gather information about tens or hundreds of domains. Although to my knowledge, there is currently no tool that can query every one of the tools mentioned in this post, my primary OSINT tool at the moment is harpoon. It is a super useful tool which can save you a massive amount of time during your analysis. I recommend reading the documentation carefully and checking which sources are available. Example output from harpoon (looking for website snapshots):

p@eternity:~$ harpoon cache https://eff.org
Google: FOUND https://webcache.googleusercontent.com/search?num=1&q=cache%3Ahttps%3A%2F%2Feff.org&strip=0&vwsrc=1 (2018-07-07 13:04:39+00:00)
Yandex: NOT FOUND
Archive.is: FOUND
-2012-12-20 17:36:48+00:00: http://archive.is/20121220173648/https://eff.org/
-2013-09-30 21:30:38+00:00: http://archive.is/20130930213038/http://eff.org/
-2014-01-27 14:55:32+00:00: http://archive.is/20140127145532/https://eff.org/
-2014-03-18 07:18:52+00:00: http://archive.is/20140318071852/http://eff.org/
-2014-03-29 01:59:16+00:00: http://archive.is/20140329015916/http://eff.org/
-2014-10-12 13:29:16+00:00: http://archive.is/20141012132916/http://eff.org/
-2014-11-18 05:30:31+00:00: http://archive.is/20141118053031/http://eff.org/
-2014-11-26 00:27:10+00:00: http://archive.is/20141126002710/http://eff.org/
-2015-01-06 05:16:11+00:00: http://archive.is/20150106051611/http://eff.org/
-2015-02-25 23:13:18+00:00: http://archive.is/20150225231318/http://eff.org/
-2015-04-03 12:32:17+00:00: http://archive.is/20150403123217/http://eff.org/
-2015-06-03 17:17:27+00:00: http://archive.is/20150603171727/http://eff.org/
-2017-01-16 17:29:46+00:00: http://archive.is/20170116172946/https://eff.org/
-2017-02-20 20:15:58+00:00: http://archive.is/20170220201558/https://eff.org/
-2017-12-13 05:06:22+00:00: http://archive.is/20171213050622/http://eff.org/
-2017-12-17 21:18:37+00:00: http://archive.is/20171217211837/http://eff.org/
Archive.org: NOT FOUND
Bing: FOUND http://cc.bingj.com/cache.aspx?d=4505675932894641&w=enxY6wdkqMMA8cCOvykvjwxhAM6cEKCx (2018-06-07 00:00:00)

Alternatives (less sources and weak quality):

..or you might use a swiss-army knife for all areas of OSINT - datasploit. For more tools dealing with domain OSINT, you should also check-out OSINT Framework.

Parts in this series:

OSINT Primer: Domains
OSINT Primer: People
OSINT Primer: Organizations

Until next time!

Patrik
Follow @0xpatrik

Subdomain Takeover: Basics

Patrik Hudak — Wed, 27 Jun 2018 16:31:48 GMT

Although I have written multiple posts about subdomain takeover, I realized that there aren't many posts covering basics of subdomain takeover and the whole "problem statement." This post aims to explain (in-depth) the entire subdomain takeover problem once again, along with results of an Internet-wide scan that I performed back in 2017.

Ground Zero

Subdomain takeover is a process of registering a non-existing domain name to gain control over another domain. The most common scenario of this process follows:

Domain name (e.g., sub.example.com) uses a CNAME record to another domain (e.g., sub.example.com CNAME anotherdomain.com).
At some point in time, anotherdomain.com expires and is available for registration by anyone.
Since the CNAME record is not deleted from example.com DNS zone, anyone who registers anotherdomain.com has full control over sub.example.com until the DNS record is present.

The implications of the subdomain takeover can be pretty significant. Using a subdomain takeover, attackers can send phishing emails from the legitimate domain, perform cross-site scripting (XSS), or damage the reputation of the brand which is associated with the domain. You can read more about implications (risks) in my other post.

Subdomain takeover is not limited to CNAME records. NS, MX and even A records (which are not subject to this post) are affected as well. This post deals primarily with CNAME records. However, use cases for NS and MX records are presented where needed.

Regular Domains

DNS delegation using a CNAME record is entirely transparent to the user, i.e., it happens in the background during DNS resolution. The picture below illustrates the behavior of a web browser for the domain name which has CNAME record in place.

Note that a web browser is implicitly putting trust to anything that DNS resolver returns. Such trust means that when an attacker gains control over DNS records, all web browser security measurements (e.g., same-origin policy) are bypassed. This presents a considerable security threat since subdomain takeover breaks the authenticity of a domain which can be leveraged by an attacker in several ways. As will be shown in later, TLS/SSL does not fix this problem since subdomain takeover is not regular Man-in-the-middle style attack.

CNAME subdomain takeover. One of the primary types of CNAME subdomain takeover is the scenario when a canonical domain name is a regular Internet domain (not one owned by cloud providers as will be explained below). The process of detecting whether some source domain name is vulnerable to CNAME subdomain takeover is quite straightforward:

Given the pair of source and canonical domain names, if the base domain of a canonical domain name is available for registration, the source domain name is vulnerable to subdomain takeover.

The noteworthy thing in the process is ,,the base domain of a canonical domain name". That is because the canonical domain name might be in the form of a higher-level domain. If a base domain is available for registration, the higher-level domain names can be easily recreated in the DNS zone afterward.

Checking the availability of base domain names can be achieved using domain registrars such as Namecheap. One might think that testing a DNS response status for NXDOMAIN is sufficient indication that the domain name is available for registration. Note however that it is not the case since there are cases where domain name responds with NXDOMAIN but cannot be registered. Reasons include restricted top-level domains (e.g., .GOV, .MIL) or reserved domain names by TLD registrars.

NS subdomain takeover. The concept of subdomain takeover can be naturally extended to NS records: If the base domain of canonical domain name of at least one NS record is available for registration, the source domain name is vulnerable to subdomain takeover.

One of the problems in subdomain takeover using NS record is that the source domain name usually has multiple NS records. Multiple NS records are used for redundancy and load balancing. The nameserver is chosen randomly before DNS resolution. Suppose that the domain sub.example.com has two NS records: ns.vulnerable.com and ns.nonvulnerable.com. If an attacker takes over the ns.vulnerable.com, the situation from the perspective of the user who queries sub.example.com looks as follows:

Since there are two nameservers, one is randomly chosen. This means the probability of querying nameserver controlled by an attacker is 50%.
If user's DNS resolver chooses ns.nonvulnerable.com (legitimate nameserver), the correct result is returned and likely being cached somewhere between 6 and 24 hours.
If user's DNS resolver chooses ns.vulnerable.com (nameserver owned by an attacker), an attacker might provide a false result which will also be cached. Since an attacker is in control of nameserver, she can set TTL for this particular result to be for example one week.

MX subdomain takeover. Compared to NS and CNAME subdomain takeovers, MX subdomain takeover has the lowest impact. Since MX records are used only to receive e-mails, gaining control over canonical domain name in MX record only allows an attacker to receive e-mails addressed to source domain name. Although the impact is not as significant as for CNAME or NS subdomain takeover, MX subdomain takeover might play a role in spear phishing attacks and intellectual property stealing.

Cloud Providers

Cloud services are gaining popularity in recent years. One of the basic premises of the cloud is to offload its users from setting up their infrastructure. Organizations are switching from an on-premise setup to alternatives such as cloud storage, e-commerce in a cloud, and platform-as-a-service, to name a few.

After a user creates a new cloud service, the cloud provider in most cases generates a unique domain name which is used to access the created resource. Because registering a domain name via TLD registrar is not very convenient because of a large amount of cloud service customers, cloud providers opt to use subdomains. The subdomain identifying unique cloud resource often comes in the format of name-of-customer.cloudprovider.com, where cloudprovider.com is a base domain owned by the particular cloud provider.

If the cloud service registered by an organization is meant to be public (e.g., e-commerce store), the particular organization might want to have it present as part of their domain. The main reason behind this is branding: shop.organization.com looks better than organization.ecommerceprovider.com. In this case, the organization has two choices:

HTTP 301/302 redirect — 301 and 302 are HTTP response codes that trigger a web browser to redirect the current URL to another URL. In the context of cloud services, the first request is made to a domain name of an organization (e.g., shop.organization.com) and then redirect is made to a domain name of cloud providers (e.g., organization.ecommerceprovider.com).
CNAME record — Using this method, the ,,redirect" happens during DNS resolution. The organization sets CNAME record, and all traffic is automatically delegated to the cloud provider. Using this method, the URL in the user's browser stays the same. Note however that the particular cloud service must support delegation using CNAME records.

If the CNAME record method is used, the possibility of subdomain takeovers comes into play. Even though the cloud provider owns the base domain of a canonical domain name, subdomain takeover is still possible as is presented in the next sections.

The providers in the subsequent sections were chosen based on three primary reasons:

Prevalence — Based on statistics on CNAME records, cloud providers domains with the highest usage in CNAME records were prioritized.
Support for CNAME records — As explained above, cloud provider needs to support CNAME delegation. Cloud providers realize that customers request such behavior and the most popular cloud providers already support it.
Domain ownership verification — The chosen cloud providers are not verifying the ownership of the source domain name. Since the owner does not need to be proven, anyone can use expired cloud configuration to realize subdomain takeover.

Amazon CloudFront

Amazon CloudFront is a Content Delivery Network (CDN) in Amazon Web Services (AWS). CDN distributes copies of web content to servers located in different geographic locations (called points of presence). When a user makes a request to CDN, the closest point of presence is chosen based on visitors location to lower the latency. CDNs are utilized by organizations, mainly to distribute media files such as video, audio, and images. Other advantages of CDNs include Denial of Service attacks protection, reduced bandwidth, and load balancing in case of high traffic spikes.

CloudFront uses Amazon S3 as a primary source of web content. Amazon S3 is another service offered by AWS. It is a cloud storage service (S3 is an abbreviation for Simple Storage Service) which allows users to upload files into so-called buckets, which is a name for logical groups within S3.

CloudFront works with the notion of distributions. Each distribution is a link to specific Amazon S3 bucket to serve the objects (files) from. When the new CloudFront distribution is created, a unique subdomain is generated to provide access. The format of this subdomain is SUBDOMAIN.cloudfront.net. The SUBDOMAIN part is produced by CloudFront and cannot be specified by a user.

In addition to a randomly generated subdomain, CloudFront includes a possibility to specify an alternate domain name for accessing the distribution. This works by creating CNAME record from alternate domain name to subdomain generated by CloudFront. Although Amazon does not provide documentation about the internal CloudFront concepts, the high-level architecture can be deducted from its behavior. Based on the geographic location, DNS query to any subdomain of cloudfront.net leads to the same A records (in the same region). This indicates that CloudFront is using the virtual hosting setup in the backend. After the HTTP request arrives, CloudFront's edge server determines the correct distribution based on HTTP Host header. Documentation also supports this theory as it states: ,,You cannot add an alternate domain name to a CloudFront distribution if the alternate domain name already exists in another CloudFront distribution, even if your AWS account owns the other distribution"". Having multiple alternate domains pointing to one distribution is correct, however, having the same alternate domain name present in multiple distributions is not.

Therefore to correctly handle alternate domain names, CloudFront needs to know beforehand to which distribution the alternate domain name is attached. In other words, having CNAME record configured is not enough, the alternate domain name needs to be explicitly set in distribution settings.

The problem with alternate domain names in CloudFront is similar to problems explained in Regular Domains section. Let's assume that sub.example.com has a CNAME record set to d1231731281.cloudfront.net. When there is no sub.example.com registered in any CloudFront distribution as an alternate domain name, subdomain takeover is possible. Anyone can create a new distribution and set sub.example.com as an alternate domain name in its settings. Note that however, the newly created CloudFront subdomain does not need to match the one specified in the CNAME record (d1231731281.cloudfront.net). Since CloudFront uses a virtual hosting setup, the correct distribution is determined using HTTP Host header and not DNS record.

The picture below shows the error message that is presented after HTTP request to an alternate domain name which has the DNS CNAME record to CloudFront in place but is not registered in any CloudFront distribution.

This error message is a solid indication of the possibility of subdomain takeover. Nevertheless, the two exceptions need to be taken into account:

HTTP / HTTPS only distributions — CloudFront allows specifying whether the distribution is HTTP-only or HTTPS-only. Switching HTTP to HTTPS might provide correct responses for some distributions.
Disabled distribution — Some distributions might be disabled. Disabled distribution is no longer actively serving content while still preserving its settings. It means that some alternate domain name might be throwing an error message after HTTP request. However, it is even registered inside the disabled distribution and thus is not vulnerable to subdomain takeover. The correct way to determine whether an alternate domain is registered inside some distribution is to create a new distribution and set the alternate domain name. If the registration process does not throw an error, the custom domain is vulnerable to subdomain takeover. The screenshot below shows the error that is presented after the user tries to register the alternate domain name which is already present in some other CloudFront distribution.

Other

As presented in the case of CloudFront, subdomain takeover is possible even on cloud services which do not have its base domain available for registration. However, since cloud services provide a way of specifying alternate domain names (CNAME records), the possibility of subdomain takeover is still present. This section provides a quick overview of other cloud services which work very similarly to CloudFront (virtual hosting architecture).

Amazon S3 — Amazon S3 was briefly mentioned in previously. The default base domain used to access the bucket is not always the same and depends on the AWS region that is used. The full list of Amazon S3 base domains is available in AWS documentation. Similarly to CloudFront, Amazon S3 allows specifying the alternate (custom) domain name to access the bucket's content.
Heroku — Heroku is a Platform-as-a-Service provider which enables deployment of an application using simple workflow. Since access to the application is needed, Heroku exposes the application using subdomain formed on herokuapp.com. However, it is also possible to specify the custom domain name to access the deployed application.
Shopify — Shopify provides a way of creating and customizing e-commerce stores in the cloud. The default subdomain to access the store is built on myshopify.com. As services described before, Shopify allows specifying alternate domain names. Noteworthy is that Shopify verifies correct CNAME record configuration. However, this verification is not domain ownership verification. Shopify only checks for accurate CNAME record that is present in the alternate domain's DNS zone. This verification, therefore, does not prevent subdomain takeovers.
GitHub — GitHub is a version control repository for Git. GitHub also allows free web hosting using their GitHub Pages project. This web hosting is usually used for project's documentation, technical blogs, or supporting web pages to open-source projects. GitHub Pages supports custom domain name in addition to default domain name under github.io.
Microsoft Azure — Microsoft Azure is a more prominent cloud provider, similar to AWS. It is different compared to the cloud services mentioned above in that it does not provide a virtual hosting architecture. Simply put, for each cloud service, Azure creates own virtual machine with own IP address. Therefore the mapping between a domain name and IP address is unambiguous (one-to-one mapping). Noteworthy is that since this is not a regular virtual hosting setup, configuring CNAME record does not necessarily have to be explicitly defined in the resource settings. Azure provides multiple cloud services but the ones discussed in this thesis have default domains of cloudapp.net and azurewebsites.net. Its documentation describes setting the link between the domain name and Azure resource using A or CNAME records (pointing to one of the two domains mentioned previously). An interesting observation is that for A records, Azure does a domain ownership verification using TXT records. However, it is not the case for a CNAME record, and subdomain takeover is, therefore, possible even in the case of Microsoft Azure.

For an extended listing of affected cloud providers, I highly recommend checking "Can I take over XYZ?" guide.

Internet-wide Scan

Project Sonar can be used to show the prevalence of subdomain takeover across the Internet. Because Project Sonar already contains resolved CNAME records, it is pretty straightforward to automate scanning for subdomain takeover across the Internet. This section explains its results.

Chain of CNAME records. In some instances, CNAME records might form CNAME record chains. Let's have the domain sub.example.com which has a CNAME record to sub.example1.com. If in turn, sub.example1.com has a CNAME record to sub.example2.com a three-way chain is formed:

sub.example.com -> sub.example1.com -> sub.example2.com

In such cases, when the base domain of last domain in the chain (example2.com) is available for registration both sub.example1.com and sub.example.com are affected. Fortunately, Project Sonar implicitly contains all CNAME references in the chain. For a chain given above, even though there is no direct CNAME record from sub.example.com to sub.example2.com, Project Sonar contains this record. Therefore, no direct changes need to be made to the automation tool to support CNAME record chains in Project Sonar.

The scanning was performed using a custom automation tool which I don't plan to release yet. The tool was able to scan cloud provider domains and found 12,888 source domain names vulnerable to subdomain takeover (November 2017). The cloud provider distribution follows:

Some parts of this post are excerpts from my Master's Thesis.

Check out my other posts about subdomain takeovers:

Until next time!

Patrik
Follow @0xpatrik

Subdomain Takeover: Starbucks points to Azure

Patrik Hudak — Mon, 25 Jun 2018 22:07:01 GMT

This post is the write-up about bug bounty report that I reported back in March 2018 to Starbucks. The report is now disclosed, and I was awarded $2,000 bounty. Although I have written about subdomain takeover in multiple posts, this case was somehow different.

The domain in question was svcgatewayus.starbucks.com. The domain pointed to a non-existing resource in Microsoft Azure. I realized that I have never talked about Microsoft Azure as a potential vector for subdomain takeover.

Firstly, Azure provides multiple services. I look for two primary services:

Azure Websites — .azurewebsites.net
Cloud Apps — .cloudapp.net

The most significant difference compared to CloudFront and other similar services are, that Azure provides dedicated IP address to both of these services. The provided subdomain that points to it using A record. In other words, Azure doesn't utilize virtual hosts setup (as I described previously). This means that for potential subdomain takeover, you only need to look for DNS status being NXDOMAIN.

There are lots of misconceptions about when the subdomain takeover for Azure is possible. I recommend running a simple dig command:

dig -t A DOMAIN_TO_CHECK

Is the response status NXDOMAIN? If yes, great, the takeover might be possible. Note that receiving 404 HTTP error does not mean the subdomain takeover is possible at all! As I said before, the services have dedicated VPS. For successful subdomain takeover, DNS request should always return NXDOMAIN.

The subdomain in the report pointed to 1fd05821-7501-40de-9e44-17235e7ab48b.cloudapp.net. I needed to create a PoC which was a little bit tricky. The rough guideline of how I did it follows:

Created a new Cloud Service in the portal. It asks for a custom domain name. Remember: This domain name needs to match since you are not dealing with virtual host anymore. You can confirm this theory by noticing that Cloud Service never asks for a domain name which you will use for CNAME.
Created a Storage Account for the Cloud Service in the Azure portal.
Azure requires a specific format for deployment of Cloud Services which is generated by Visual Studio. I created a simple ASP.NET web application and uploaded it to this Cloud Service using this tutorial.
Because of DNS, the A record for svcgatewayus.starbucks.com is pointing to Azure, and so the HTTP request returns the content from the ASP.NET application I just deployed.

For Azure Websites, the process is much more straightforward and looks closer to traditional PaaS. To create PoC for Azure Websites, I recommend the following this tutorial. I tested that, and it works correctly.

I have to say that I find the Azure portal very messy. IMHO it is a lot more complex than AWS with no significant benefits.

Azure offers several other services with a different structure, subdomains, and PoC process that I will cover in one of the future posts. Follow me on Twitter to get it first.

Until next time!

Patrik
Follow @0xpatrik