Twint is an advanced Twitter scraping tool written in Python that allows for scraping Tweets from Twitter profiles without using Twitter’s API. That being said, it does violate Twitter’s TOS so use of this tool should be used with caution. I recommend research purposes only. Twint solves one big problem with Twitter: the API limits. Currently, Twitter’s API only allows you to pull the last 3200 tweets from a user. That presents problems for a number of reasons, but it is a huge factor for accounts that tweet very frequently or are legacy accounts going back years (or over a decade!). The developer of Twint is Francesco Poldi. He and I have worked on many projects together including JungleScam and Platypus. He is also working on Intelgram, which is currently in private beta. Twint, however, is probably the OSINT tool I use most in my workflow. This is primarily because it’s so easy to use and allows for an insane amount of customization with very limited Python experience.
Twitter is one of the largest social media platforms and is the only platform that openly allows for scraping without advanced infrastructure. You can write a web scraper for it yourself within 30 minutes if you know where to start. Twitter is also home to breaking news, important conversations, and the biggest controversies. It’s also where politics happens, or doesn’t. As an OSINT analyst, collector, or investigator, ignoring Twitter is likely to put a huge hole in your data set. Even if accounts you monitor have been banned on Twitter, conversations about those individuals or groups still occur on the platform and provide context. Twitter will also generate leads to other platforms that are not as accessible. You can find Pastebin sites, Telegram groups, Discord channels, onion links, or whatever else you’re looking for. Bottom line: don’t ignore Twitter.
Francesco has put in a great deal of effort to make Twint as easy to setup as possible. It used to be that Twint required a few steps to get up and running. Now, basic use of Twint takes seconds and advanced use can be setup in a few hours. The only prerequisites you need is Python 3.x and Git installed. To get started, follow these steps in the command prompt:
- git clone https://github.com/twintproject/twint.git
- python3 setup.py install
- twint -u [insert username]
Just like that, you have the script pulling all tweets from the specified username. Now that you’ve installed the script, however, there are so many other things it can do. Here’s a list of a few other options/examples, though there are plenty more:
twint -u username– Scrape all the Tweets from user‘s timeline.
twint -u username -s pineapple– Scrape all Tweets from the user‘s timeline containing pineapple.
twint -s pineapple– Collect every Tweet containing pineapple from everyone’s Tweets.
twint -u username --year 2014– Collect Tweets that were tweeted before 2014.
twint -u username --since 2015-12-20– Collect Tweets that were tweeted since 2015-12-20.
twint -u username -o file.txt– Scrape Tweets and save to file.txt.
twint -u username -o file.csv --csv– Scrape Tweets and save as a csv file.
twint -u username --email --phone– Show Tweets that might have phone numbers or email addresses.
twint -s "Donald Trump" --verified– Display Tweets by verified users that Tweeted about Donald Trump.
twint -g="48.880048,2.385939,1km" -o file.csv --csv– Scrape Tweets from a radius of 1km around a place in Paris and export them to a csv file.
Refer to the Wiki for all available commands and expected outputs.
Getting started with Twint is easy, becoming a power user of the tool, however, is a bit harder. If you want to take this tool to the next level, I recommend learning how to build a custom module using Twint and Python. This will eliminate you having to run the same commands over and over and will allow you to build custom tools for specific workflows. For example, I once wrote a tool using Twint as a module that would scrape Tweets containing a certain keyword and export them as a csv. Then, I added a part to the script that extracted the “username” column of the csv and downloaded the followers of each username extracted to another csv. Finally, I added a part to the script which would download all tweets from the followers of the usernames extracted. This created an ontology of the original keyword I specified within a particular date range. Here’s that breakdown visually for clarity.
all tweets for keyword #1 over the last 30 days > usernames of all accounts mentioning that keyword > followers of each username extracted > all tweets for followers of usernames extracted > large dataset of related activity > analysis = ontology.