twitterSentiment

Sentiment analysis of tweet dataset using the sentiment140 dataset from kaggle. https://www.kaggle.com/kazanova/sentiment140

file structure required for the project:

project/  
  | input/  
  |   | dataset.csv  
  | controller.py  
  | evaluate.py  
  | predict.py  
  | read_data.py

To run the code: python3 controller.py

If you want to modify what parts of the code are run, edit controller.py

This project was created for the cps803 class at Ryerson University. In it I use 3 models: LinearSVC, BernoulliNB and LogisticRegression to categorise tweets into 2 sentiment categories. I explore the different ways you can prepare data for these models and how well each of these preparations performs.

Warning: it takes a long time to run, make sure you start it before making dinner or something.

Results

The negative word cloud generated by the preprocessed data:

The positive word cloud:

I wasn't kidding about how long it takes. This is only for training the models. I doesn;t take into account all of the preprocessing.

Presented here are the results of the models on all data preparations. We see that the models do not perform well with word vector data

The confusion matrix for the best performing model.

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
images		images
.gitignore		.gitignore
README.md		README.md
controller.py		controller.py
evaluate.py		evaluate.py
predict.py		predict.py
read_data.py		read_data.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

twitterSentiment

Results

About

Releases

Packages

Languages

thomasyoung-audet/twitterSentiment

Folders and files

Latest commit

History

Repository files navigation

twitterSentiment

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages