ITC

Using Data Mining for #Bitcoin Sentiment Analysis

February 3, 2020
, 2:17 pm
, ITC Publications

Abstract

Twitter is a social media platform where users can voice their opinions through short messages called tweets. This research project focuses on using web scraping to gather and store tweets analyzing how changes in Bitcoin’s price affect peoples’ views on the cryptocurrency. For this, we scrapped Twitter’s main feed, used various APIs to attribute data, and determined tweet’s sentiments with analysis tools such as TextBlob. These sentiments were then contrasted to Bitcoin’s price fluctuations to identify trends and insights.

Challenges

Understanding the nature of Twitter’s UI, a complex JavaScript generated application;
Using Selenium web driver along with BeautifulSoup to parse the HTML and translate it to Python objects;
Overcoming twitter.com http request limits through the same IP using a VPN;
Avoiding rate limit errors from Tweepy (Twitter API) using a wait_on_rate_limit flag on the API’s request parameters; and
Setting up a headless web driver that runs on memory (with no UI) on AWS.

Achievements

Created a reliable application that seamlessly scraped Twitter’s news feed, users feeds and collected Bitcoin pricing data;
Set up a MySQL database with a scalable architecture, supporting large quantities of entries as well as dynamic entities;
Deployed an application on an AWS machine and connected it to a dockerized redash instance;
Collected over 1 million records in under 12 hours of scraping; and
Performed sentiment analysis on over 500k gathered tweets.

Roadmap

Create a model to predict Bitcoin’s price;
Learn how to differentiate between bot-generated and human-generated tweets;
Introduce Twitter conversation chains into the data model; and
Separate facts from opinions using Natural Language Processing.

Link to GitHub: https://github.com/martinhadid/twitter-scraper

Authors

Martin Hadid and David Melul Fresco

Would you like to apply for our Data Science Fellows, Cyber Security Fellows or Full-Stack Development program – apply here to receive more information:

Please fill out in English

ITC

Using Data Mining for #Bitcoin Sentiment Analysis

Share this post