Twitter is a social media platform where users can voice their opinions through short messages called tweets. This research project focuses on using web scraping to gather and store tweets analyzing how changes in Bitcoin’s price affect peoples’ views on the cryptocurrency. For this, we scrapped Twitter’s main feed, used various APIs to attribute data, and determined tweet’s sentiments with analysis tools such as TextBlob. These sentiments were then contrasted to Bitcoin’s price fluctuations to identify trends and insights.
- Using Selenium web driver along with BeautifulSoup to parse the HTML and translate it to Python objects;
- Overcoming twitter.com http request limits through the same IP using a VPN;
- Avoiding rate limit errors from Tweepy (Twitter API) using a wait_on_rate_limit flag on the API’s request parameters; and
- Setting up a headless web driver that runs on memory (with no UI) on AWS.
- Created a reliable application that seamlessly scraped Twitter’s news feed, users feeds and collected Bitcoin pricing data;
- Set up a MySQL database with a scalable architecture, supporting large quantities of entries as well as dynamic entities;
- Deployed an application on an AWS machine and connected it to a dockerized redash instance;
- Collected over 1 million records in under 12 hours of scraping; and
- Performed sentiment analysis on over 500k gathered tweets.
- Create a model to predict Bitcoin’s price;
- Learn how to differentiate between bot-generated and human-generated tweets;
- Introduce Twitter conversation chains into the data model; and
- Separate facts from opinions using Natural Language Processing.
Link to GitHub: https://github.com/martinhadid/twitter-scraper
Would you like to apply for our Data Science Fellows, Cyber Security Fellows or Full-Stack Development program – apply here to receive more information: