One of my main missions is to help create a bridge between the ‘old school’ OSINT community and the infosec recon community. OSINT has started to gain traction in the infosec community, but there’s still a bit of a communication gap. Private investigators and intelligence analysts aren’t exactly communicating with penetration testers and DFIR experts as much as they should and if they do, they’re often using different terminology and applying the same tools using two completely different methodology. This article is an attempt to try to identify the crossroads between the hashtag #osint and other hashtags in the infosec and other communities, such as investigative journalism. I’ve spent the last 6 months or so interacting with the #osint community on Twitter. At the time of this writing, I’m the 18th most active member on Twitter using the #osint hashtag for 2018. I want to explore these findings and other insights.
The data for this study was obtained using Twint, an advanced Twitter web scraping tool written in Python. I scraped all of the tweets mentioning “#osint”, “#recon”, “#socialengineering”, “#socmint”, “#dfir”, and “#geoint”. I took the data and exported it to a .csv file. I then imported that .csv into Excel where I manipulated the data using pivot tables and charts.
The hashtags mentioned were selected randomly based on my experience on Twitter over the last 6 months. These are not the hashtags most closely associated with #osint, as I soon found out after another user on Twitter gave me the exact data for the last month. I attempted to pivot the study to examine those hashtags instead, but I ran into issues while scraping due to an update with Twint that reverted me back to the original dataset.
Here’s what I came up with.
After extracting every Tweet mentioning the selected hashtags, it turns out #osint, #socialengineering, and #dfir were mentioned the most this year. #socialengineering and #dfir have actually seen a relative decrease in mentions towards the end of the year where #osint continues to rise. #socmint, #geoint, and #recon were lower in comparison, with a steady increase in #recon over time.
What does this information tell me? It tells me the #dfir and #socialengineering community on Twitter is very active and if engaged properly from the #osint community, a lot of value can be found. As far as the rest of the data, the engagement is relatively low, but as we will see in later data there’s still a lot of value to be found there. It’s interesting to see that #osint and #recon are both on the rise.
Here’s where things get interesting. If you refer to the first chart, you’ll see that #osint, #dfir, and #socialengineering are all very active. However, the engagement in each hashtags is very different. #dfir has the most engagement by a long shot, despite tapering off compared to #osint in overall tweets towards the end of the year. #socialengineering has very poor engagement, almost comparable to #recon despite having significantly more activity than #recon in number of tweets. The rest of the data is to be expected considering the revelations of the first chart.
What does this information tell me? It tells me that engagement in the #dfir community from #osint is far more likely to be received than #socialengineering. However, #socialengineering has greater opportunity for #osint to become a relevant part of the market share of overall engagement compared to total tweets. This can also be said about #recon, #socmint, and #geoint not counting for overall activity (number of tweets).
Every community has users with different lifestyles. This means that they are likely to be active at different times of day. If I want to engage with other communities on Twitter and build a bridge between communities, I need to join the conversation when everyone is at the party. This chart displays the time of day (hours) each community is active. This is derived by total number of tweets made per hour across 2018. This data reflects Central Time.
How can I use this information? There’s two things I want to do here. First, I want to engage each community during their peak hours. Second, I want to engage each community when their activity most closely matches #osint.
Example 1: The peak hour for #dfir is 14:00-15:00 CT. If I start a conversation with the DFIR community using the #dfir hashtag, they will most likely see it between that hour.
Example 2: Although the peak hour for #dfir is 14:00, the hour where #dfir and #osint are nearly identical is 11:00-12:00 CT. If I want to start a conversation with #dfir and bring other #osint people along, 11:00 may be the smarter choice.
I know what you’re probably thinking. A lot of these communities are already involved in #osint. While that isn’t exactly false, the truth is actually quite surprising. Despite it’s high activity (total tweets), #dfir mentions #osint the least out of the 5 analyzed. #socmint, coming in dead last in overall activity, mentions #osint the most. This is followed by a weak second in #socialengineering, which is very close to the other 3.
What can I do with this information? I can determine that #socmint is not in need of engagement from the #osint community. They are almost interchangeable when considering overall tweets by #socmint and #osint references. But it also gives me useful information that will help me in my original goal. #dfir has high activity and high engagement, but hardly mentions #osint. Factoring in all other data, out of the 5 selected I’m likely to target that community first. This is followed by #socialengineering due to high activity, lower engagement, and relatively low mentions of #osint. This is valuable for other reasons (market share-for the lack of better terms).
Once I resolve my web scraping issues, I’ll likely conduct an additional study looking at the hashtags shown to be the most closely related to #osint, not my arbitrary selections of hashtags based on personal experience. It will be interesting to see if closely related means high mentions of the hashtag #osint or if it means similar users, similar content, or something else.
With the information found in this initial study, however, I will soon plan ways that I, coming from the #osint community, can better interact with #dfir and #socialengineering. This might include inviting prominent people onto The OSINT Podcast or simply engaging more in the content they produce.
Through this study I’ve also found the top 25-100 accounts associated with each hashtag. This is based on total tweets sent mentioning the hashtag. I may refine these results and find the top 10 users who mention each hashtag with high frequency and have a high follower account in order to determine if they are “influencers” or not. This will also be useful when selecting podcast guests or choosing which content to engage with.
I hope you enjoyed this dataset being explore. Make sure to share or comment to help out the #osint community.