The internet is vast and the amount of content published on it every day, every second, is unfathomable. As an OSINT investigator, collector, or analyst, you need to find a way to cut through the chaff and get the information that’s valuable to you. The first thing you have to understand is that you will not get everything. That’s right, you’re going to miss most of it. The only thing you can hope to do is increase your chances of finding the content you’re looking for by eliminating the time constraint as best you can. This can be achieved through data mining, web scraping, and manual analysis of dead ends over time. You need to know where the good stuff (or bad) is most likely to be found and where it is most likely not to be found. From there, you can begin to focus your allocation of available time to achieve optimal results. I’m a big believer in the Pareto Distribution or Pareto Principle, meaning that 80% of the value is found in 20% of the results; likewise, 20% of your effort will derive 80% of your value.
With that being said, let’s talk about one place where valuable content can be found within the OSINT community. That’s paste sites. More often than not, Pastebin. Applying the Pareto Principle, this article will focus solely on Pastebin while intentionally ignoring Ghostbin, Skidbin, and other paste sites. This is because we are trying to allocate the smallest amount of time to create the most amount of value, that 80/20. With that introduction, let’s get started.
Before we get started, I think it’s important for anyone in the OSINT community to purchase a Pastebin API key by getting a Pastebin Pro account. It’s a lifetime subscription that’s only $49.95 at the time of this writing. Look out for Black Friday deals if you’re really strapped for cash as it sometimes drops to $19.99. This allows you to build your own scrapers, set up multiple email alerts, and access different types of pastes including public, private, and unlisted.
Without any technical knowledge, there’s a simple place to start. You must first acknowledge that individuals or groups that are engaging in malicious or nefarious activity aren’t likely to index their results in search engines. Pastebin makes this easy by allowing private and unlisted pastes. However, what many don’t know is that unless they share the links to those paste sites in a private forum or encrypted messenger, they’re leaving breadcrumbs. I’ve written an extensive post on how to use Google Dorks to locate unlisted Pastebin sites that have been indexed in forums, on social media, and other platforms by specifying intext: operators while negating inurl: operators. Read the article for full context and to lay the foundation for other more advanced tools that close a few intel gaps.
Open Source Tools (Python)
Once you get a hang of the platform, discover a few unlisted pastes and conduct a few basic investigations, it’s time to move onto more advanced tools. For this, you’ll likely need a very basic understanding of Git, Python, and basic troubleshooting. I want to emphasize basic. I’ve proven to myself on multiple occasions to be terrible at all three. I get by with a little help from my friends (Francesco, I’m talking about you). If you aren’t familiar with any of the above mentioned items, you can either jump in and get your hands dirty, or seek out free or cheap materials to learn. I recommend the Python course by Justin Seitz for Python. I recommend YouTube and only YouTube for Git. You’ll pick up a few tricks like tree along the way that will save you tons of time. Once you’re on the level to start using open source OSINT tools, move on to the next section. Come back when you are ready, padawan.
I did a written interview with the developer of Scavenger and posted the conversation on this blog. In a nutshell, this bot allows you to search for leaked credentials on a variety of paste sites. For the OSINT community, this is a potentially great tool for both blue team and red team applications as you can search to protect assets by securing credentials or internally exploit vulnerabilities for process and infrastructure improvements. Now, what’s great about Scavenger is that it doesn’t just talk the talk, it walks the walk. The developer has connected a Twitter account to show the bot in action. Now, let’s talk details. What can Scavenger find for you? According to the Github wiki, Scavenger has found:
- private RSA keys
- WordPress configuration files
- MySQL connect strings
- onion links
- links to files hosted inside the onion network (PDF, DOC, DOCX, XLS, XLSX)
The Github page is very intuitive and should serve as a guide to tailor the tool to your specific needs. Give it a spin and reach out to the developer for questions.
If you’re new to web scrapers, don’t have a PasteBin API key, or want to get a taste of what there is to offer on Pastebin, check out PwnBin. It’s a very basic web scraper for Pastebin that allows you to specify a keyword and an output file type. As a disclaimer, I call this a basic scraper because it has issues. One of those issues is that it’s limited by the number of requests it can make before being blocked by Pastebin. In my case, it was only able to scrape 55 pages for my specified keyword. Now, 55 pages is still quite a bit. However, because of the large number of total pastes available, it’s statistically insignificant. Additionally, PwnBin is very slow. The scrape of 55 pages took long enough where I left my workstation and came back to check on it. Maybe 20 minutes. Now, this can easily be run in the background; however, as a reminder of total allocated time versus total output, this tool should only serve as an introduction to web scraping on Pastebin or as something that can be easily put in place to run in the background or benchmark against other more sophisticated tools.
pwnedOrNot is a tool that’s a level up from PwnBin. It’s more sophisticated than PwnBin, has additional features and specifications, but is also limited by lack of API. What is interesting about pwnedOrNot is that it utilizes haveibeenpwned API to benchmark and validate the breaches found in the Pastebin scrape. Officially, the use of pwnedOrNot is “to find passwords for compromised email addresses”. So, what pwnedOrNot does is scrape for compromised email addresses, then crosschecks that data against haveibeenpwned for the below information:
- Name of Breach
- Domain Name
- Date of Breach
- Fabrication status
- Verification Status
- Retirement status
- Spam Status
This tool has been tested on:
- Kali Linux 2019.1
- BlackArch Linux
- Ubuntu 18.04
- Kali Nethunter
Now, again, for the sake of liability, you should only use this tool for blue team or red team applications. I don’t recommend using it for any black hat methods as you may be subject to your country’s laws.
So we’ve covered the shot gun approach with Scavenger, the pea shooter response with PwnBin, and the sniper approach with pwnedOrNot. Let’s talk about something a bit more niche. CardPwn is a tool that specifically searches for credit card numbers. It works like this [enter card number] > [scrape] > [results]. This is different than pwnedOrNot because it’s not looking for emails or any other leaked credential. It’s not looking for all credit card numbers exposed. Only the ones you specify. It’s a great blue team tool for personal information, executive information, or any other individual seeking to increase their financial security or to gain situation awareness.
As a disclaimer, this tool is and has been under development. I’ve had issues with reliability. I’ve had a few errors spat back at me in the recent update that make the tool unusable, however, I’ve used it in the past with some success. At the time of this writing, the last update was 9 days ago.
- Kali Linux 2019.1
- Ubuntu 18.04
So, that’s a wrap. I hope you enjoyed this quick highlight into OSINT applications to Pastebin and other paste sites. I’ll likely update this when I discover new tools or simply create a tweet with the new tool while linking back to this article for reference. Make sure to share this on Twitter and other platforms as well as provide feedback in the comment section. If you have any questions about the tools mentioned, feel free to send me a DM on Twitter. I’d advise against using the contact page on this blog (I’m considering getting rid of it anyway).