This application is for analyzing and storing Twitter data in real time.
Download and install Kafka 2.10-
Start Zookeeper
cd /path/to/kafka
bin/ config/
Start Kafka
bin/ config/
Create Kafka topic
bin/ --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic tweets
Create a Twitter user and create a Twitter app at
to get your Consumer key and Access Token.
Create Twitter Kafka pipeline to collect tweets with:
Run Kafka Twitter Producer with:
cd kafka-twitter-master
./gradlew run -Pargs="conf/producer.conf"
Install MySQL, log in and then create an user that you want the app to use, and grant it permissions.
Then do the following command:
mysql> USE default;
Spark creates the tables if it doesn't find them in the given database. But if you want you can make them with these commands
mysql> CREATE TABLE LDAResults( -> peak_at TIMESTAMP, -> LDA TEXT, -> hashtags TEXT ); mysql> CREATE TABLE Tweets( -> username TEXT, -> created_at TIMESTAMP, -> text TEXT, -> hashtags TEXT, -> lang TEXT, -> partition TIMESTAMP );
To keep your database from blowing up I suggest you make some kind of cleaning event that removes every day x days old data.
Download and build Apache Spark. Version used in development of this app was 1.4.0.
Download and add mysql-connector-java.jar
to Spark Classpath. Version used 5.1.36.
Change your MySQL database information in Launcher.scala
and LDAObject.scala
cd spark-twitter-lda/ sbt assembly
Launch Apache Spark standalone server.
cd /path/to/spark sbin/
You need to start this application at even 10minutes for the peak detection to work correctly. Launch App with command:
./bin/spark-submit --class main.scala.Launcher --master spark://localhost.localdomain:7077 /path/to/sparktwitterlda.jar
For data visualization you can use anything that supports mysql connection. I created a simple graph in d3.js and php.