This is a project which I have done under the guidance of Dr. L. Bhera,IIT Kanpur, under a research internship.
Dataset Folder contains the twitter tweets.
Train file contain tweet_id, sentiment and tweet_text.
- tweet_id : unique for every tweet.
- sentiment : three types - negative, neutral and positive.
- tweet_text : tweets over which you have to analyse the sentiment.
test_sample data has two columns : tweet_id and tweet_text
- tweet_id: unique for every tweet
- tweet_text: sentiment over which you have to predict whether this text is negative, neutral or positive.
Three approaches were taken:
- Logistic Regression
- LSTM with Glove word embedding
- Bidirectional LSTM with Glove word embedding
Accuracy acheived on Kaggle test_sample:
- Logistic Regression - 65%
- LSTM with Glove word embedding - 67.5%
- Bidirectional LSTM with Glove word embedding - 67%