I recently did a deep dive on the OSINT tool Spiderfoot. There are so many features and I don’t think I did it much justice in such a short time frame (30 minutes). I thought a good way to add value is to get the information from the source, the developer of Spiderfoot. Enjoy this reading Steve Micallef’s answers to a variety of questions I’ve received and have been curious of myself. Make sure to reach out to him at @binarypool on Twitter if you have more questions and check out spiderfoot.net to get your hands dirty!
1. Who are you? What is your background?
For the last 10 years or so I’ve held mostly management positions in Information Security but my background is mostly in security and systems engineering, including a fair bit of software development (though I wouldn’t rate myself highly as a true software engineer). Outside of my working life I spend a lot of time developing SpiderFoot, advising startups and helping out a little bit with BSides Zurich when time permits.
2. How did you get into OSINT?
About 15 years ago when I was working in a large bank as a Security Analyst, I remember attending a training course by Haroon Meer and Roelof Temmingh (Roelof eventually went on to create Maltego). They demoed a tool called “Bidiblah” which did a bunch of reconnaissance tasks to build up a footprint of a domain name. I was really impressed with how they created an easy-to-use GUI that was quite simple but also powerful in its ability to obtain a lot of data from basic methods like web scraping and DNS brute-forcing. I figured I could learn C# through creating something with a similar idea behind it–but open source–and did so in 2005 by creating the first incarnation of SpiderFoot as a Windows desktop application. I didn’t come back to SpiderFoot again until 2012, this time with the motivation to learn Python, so I re-wrote it from scratch. Since working on SpiderFoot, my appreciation for OSINT has grown as more data sources have become available and the community has grown.
3. Why did you decide to develop Spiderfoot?
Quite honestly my initial goal was simply to learn C#, and then again later as a way to learn Python because I learn through doing. Over time, as I’ve received feedback from the community and seen the growing interest in SpiderFoot HX, the rewarding feeling of gradually crafting and maturing a product is for me an ongoing source of motivation and satisfaction.
4. Are there any other tools you’ve built besides Spiderfoot?
Plenty of different security systems and ad-hoc tools over the years, but SpiderFoot is the only open source one for now 🙂 I’m tinkering with a couple of other ideas when I need some contrast to SpiderFoot but I’m not sure if they’ll ever see the light of day.
5. What is your favorite feature in Spiderfoot?
SpiderFoot is quite unique amongst many other OSINT tools in that it doesn’t just take user’s target, query a bunch of APIs for that target and dump data. For SpiderFoot, this is only the first step in the process because any data that is found is also then analysed and used as a basis for conducting more data collection and analysis until a complete picture is formed about the target and affiliates. For example, if we are scanning example.com
, we might find from some API that firstname.lastname@example.org
exists. SpiderFoot will then automatically query data sources about that e-mail address, perhaps to find out if it’s been in a breach, if there’s a name associated or if there are PGP keys registered somewhere, and so on. This kind of logic is why SpiderFoot returns so much data about its scan targets. Sure, it comes at the price of obtaining some irrelevant data and taking a while to complete, but I would prefer this instead of a fast scan that is missing valuable data.
6. What are some features you’d like to add?
I have a (growing) backlog of over 100 different data sources I’d like to add modules for but one area I want to further develop SpiderFoot in is its ability to integrate with other tools like Maltego, Splunk, Hunchly, GraphXR and others. The reality is that threat intelligence teams, SOC analysts and investigators who rely on SpiderFoot are using it within a broader ecosystem of tools that each have their own strengths. As someone who has long been on the consumer side of security products I’ve been often frustrated at the lack of interoperability, so that’s definitely a gap I don’t want to see in SpiderFoot.
7. For aspiring OSINT tool developers, where’s a good place to start?
A good place to start is to contribute to an existing tool out there. Most of the tools are written in Python, which is fairly accessible as a language, so if you see a missing feature, data source or whatever in a tool you’ve come across, I suggest contacting the developer and asking if they’d like such a contribution, and then go ahead and issue a pull request to their repository. If that is too daunting, then start by creating something that scratches your own itch. Maybe you have a very specific case of combining a few different data points to get unique insight – just do it, because you’ll learn a lot in the process even if the tool doesn’t become open source.
8. To your knowledge, has Spiderfoot been used for any serious OSINT investigations?
I often get e-mails from people telling me that they find SpiderFoot invaluable in their investigations and regular threat intelligence work, which I am always happy to read. I have heard of a few cases (never any specifics mentioned though) of SpiderFoot having played a pivotal role in identifying information about people doing some pretty bad things. Apparently it’s been used in some Bellingcat investigations too. If people do have “SpiderFoot stories” to share with me though, I’m always happy to hear it!
9. Spiderfoot uses a lot of services to collect OSINT, are there any you’re considering removing to streamline the process?
The goal is always to add more sources as they emerge, and prune those that die out. I am growing less fond of scraping over time because I find it yields inconsistent results, can break at any point without notice and is against the terms of service for most sites, so I will probably phase scraping-based modules out over time in favour of API-based alternatives.
10. What type of content do you use to learn more about the field?
Twitter is the source I use the most – watching the #OSINT hashtag and folks like @WebBreacher, @dutch_osintguy and @kirbstr. There is also the OSINT Rocket Chat group (specifically #General and #Resources rooms). And listening to your podcast obviously 😉 I have a section on a blog post I recently wrote about this with more links: https://medium.com/@micallst/osint-resources-for-2019-b15d55187c3f