Sentiment analysis of IMDB ratings using tensorflow Word2Vec
The IMDb Movie Reviews dataset is a binary sentiment analysis dataset consisting of 50,000 reviews from the Internet Movie Database (IMDb) labeled as positive or negative. The dataset contains an even number of positive and negative reviews. Only highly polarizing reviews are considered. The project downsyncs the dataset and no need of manually adding it to the project
Framework used: Tensorflow
Network topology:
SentimentModel = Sequential([
wordVectorizePredicates,
Embedding(vocabularies, embeddingDimension, name="embedding"),
Conv1D(filters=40, kernel_size=3, padding='same', activation='relu'),
Dropout(0.2),
Flatten(),
Dense(20, activation='relu'),
Dense(1, activation='sigmoid')
])
Accuracy achieved: 87%
- String preprocessing
- Word vectorisation
- Generating word embeddings
- Classification of a reviews from test folder as good or bad
- Saving the weights with a checkpoint
- Run
python train_NLP.py
to downsync dataset and train the network - Run
python test_NLP.py
to predict the sentiments of reviews from the test folder.
-
utils.py has all the logic for
- Downsyncing the dataset
- Preprocessing the data
- Training the network
- Saving checkpoint of trained weights
-
IMDBRating_sentiment_analysis.ipynb has all the attempts done for reaching the best network for the classification