Skip to content

Sentiment analysis of tweet dataset using multiple ML models

Notifications You must be signed in to change notification settings

thomasyoung-audet/twitterSentiment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

twitterSentiment

Sentiment analysis of tweet dataset using the sentiment140 dataset from kaggle. https://www.kaggle.com/kazanova/sentiment140

file structure required for the project:

project/  
  | input/  
  |   | dataset.csv  
  | controller.py  
  | evaluate.py  
  | predict.py  
  | read_data.py  

To run the code: python3 controller.py

If you want to modify what parts of the code are run, edit controller.py

This project was created for the cps803 class at Ryerson University. In it I use 3 models: LinearSVC, BernoulliNB and LogisticRegression to categorise tweets into 2 sentiment categories. I explore the different ways you can prepare data for these models and how well each of these preparations performs.

Warning: it takes a long time to run, make sure you start it before making dinner or something.

Results

The negative word cloud generated by the preprocessed data: results

The positive word cloud: results

I wasn't kidding about how long it takes. This is only for training the models. I doesn;t take into account all of the preprocessing.

timing

Presented here are the results of the models on all data preparations. We see that the models do not perform well with word vector data

results

The confusion matrix for the best performing model.

matrix

About

Sentiment analysis of tweet dataset using multiple ML models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages