Python is the Future of OSINT – How to Stay Ahead

Introduction

The future of the non-tech savvy OSINT professional is over.  The internet has become far too vast for an individual to collect manually and it will only continue to grow at an exponential pace.  This is my thesis, but I don’t claim to be an expert on the trend nor do I believe this thesis will go unchallenged.  The title of this post is “Python is the Future of OSINT”.  The word Python is really a stand in for any programming language, but Python seems to be the language of choice for OSINT collectors.  Another disclaimer I’d like to mention is that I don’t claim to be an expert on Python.  Quite the contrary, actually.  I’m pretty awful at it.  But I continue to develop my skills to stay one step ahead of the curve, maybe even automating a lot of my process in the meantime.  This post is an attempt to persuade you to begin the process of learning the trade and provide you a few examples on how you can do it.  I’ll even share with you my awful code to show you that even a non-developer can still make this happen.

Opinion

Python is a really robust programming language and there’s a lot to learn.  That doesn’t mean you have to learn everything.  I think the most useful thing an OSINT collector can develop with Python is a web scraper.  So learn how to make a web scraper.  Try making a few different kinds of web scrapers.  Once you can build one from scratch that’s pretty robust, move on to something more technical.  I’m about to share with you the resources I used to learn Python and make my first web scraper.  Keep in mind this was a process guided by procrastination that took me almost a year to accomplish.  These resources aren’t in any particular order, but they are the ones I used.

Resources

Intro to Data Science with Python – DataCamp

I took this course back in August of 2017 when I was still living in Seattle.  I knew the OSINT community was shifting and cyber and physical security were converging.  I was looking at getting my Master’s in Cyber Security Engineering from the University of Washington at the time but had limited technical experience.  I was an intelligence researcher primarily focusing on the APAC region and was looking for ways to streamline my process.  This course was free and I liked the UI of the website, DataCamp, so I gave it a shot.  It’s a great introduction to Python, it’s free, and I even learned SQL (sort of) while I was at it.

Pictures or it never happened.

Python for Beginners – Automating OSINT (Justin Seitz)

I can’t remember exactly when I took this course and I forgot my login information since, but this course was a good addition to the DataCamp course and I liked how it was specifically tailored to the OSINT community. In addition to learning and practicing Python more, I also learned what types of projects Python was capable of building that were useful in collecting OSINT.  This was probably the biggest lesson in the course, not exactly how to build things but what types of things to build.  I never took Seitz’ Master Class, which discussed the projects he described in the course in more detail, but I’ve heard great things about it.  The course is only $49, which is money, but it’s not very much of it.  I feel you get disproportionately more value from the course than what it’s worth.  But that’s what Justin does, gives you a ton of value for a dime (think Hunch.ly!).

Python Tutorial: Web Scraping with BeautifulSoup and Requests

This video is the exact video I used to create StormTrooper.  It’s simple but is the best guide I found on YouTube that clearly takes you from start to finish. It’s free, but it’s 45 minutes long.  Saves you coin but not time.  I think it’s incredibly worth it.  There’s something I’d like to mention though before I wrap up the resources part of this post.  I did have a decent aptitude for HTML and CSS before I even though about starting Python. This was from my eCommerce background, designing websites to sling products online since around 2012.  I never got into Javascript, so I wouldn’t consider myself a front end developer, but when it comes to web scraping, understanding HTML tags is incredibly valuable.

Bonus: FreeCodeCamp

You’ve probably heard me rant about this before on Twitter.  FreeCodeCamp is THE best way to not only learn the front and back end of web development, it’s probably the most robust and includes free certifications.  Many testimonials even mention landing a job after getting all of their certs.  Also, make sure to check out their blog on Medium, they post very easy to read, valuable content. If you take this journey slow and spend 20-30 minutes a day on FreeCodeCamp and the other tutorials I’ve mentioned above, you’ll probably be better than me after a year.  I’ve been slacking big time!

Mindset

If you make use of all the resources mentioned above, you’ll like be capable of writing a web scraper, but that doesn’t equip you with the mindset of knowing what is valuable to scrape.  While my Python ability is passable at best, learning how to build one has made it easier to communicate with real Python devs.  My projects, StormTrooper and InfoWarts are laughably bad compared to Twint, Skiptracer, SpiderFoot, etc.  But now that I understand how one is constructed, I can talk the talk even if I can’t walk the walk.  This has helped me with working on JungleScam with Francesco Poldi.  He’s an amazing developer, but I wouldn’t know what to ask for if I didn’t know what was possible.

I feel most people in the OSINT community are one of two people, though some of them are both.  The first type of OSINT person knows what information is valuable because they’ve been asked to provide it before, they’re generally curious, and they’ve faced numerous intel gaps in the past that they wish they could fill.  Then there’s the person whose never actually help a professional OSINT role before but is a wizard developer who stumbles across valuable OSINT by sheer luck and insane curiosity.  Then there’s people who are both and end up making things like Hunchly or Spiderfoot.  I find myself in the first category, but where are you?  I hope to improve my Python skills so I can add more value to the community and provide answers to a lot of the good questions I think I’m asking.

Conclusion

I don’t want to take up any more of your time so I’ll conclude with one of my favorite internet trends.

tldr;

  • Take the free Python course at DataCamp, practice for 2 weeks
  • Buy Justin Seitz Course to learn the what? to add to the how (and get more practice.)
  • Watch the video I linked to in this article to get your hand held through building your first web scraper
  • Check out FreeCodeCamp if you’re HTML retarded.
  • Have self awareness and ask the right questions, continue to learn and network
  • The whole thing will cost you $50 and a few weeks of your life.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s