This project focuses on binary sentiment classification using book reviews. Given the text of a review, the goal is to predict whether the sentiment is positive or negative. This has been implemented by training a feedforward neural network for improved performance.
This work is part of my learning journey through the Break Through Tech AI Program, where I applied machine learning concepts in natural language processing (NLP).
- Problem Type: Binary classification
- Input: Raw text from book reviews
- Output: Sentiment label (positive or negative)
-
Tokenized and cleaned text reviews
-
Removed stopwords and punctuation
-
Transformed text using
TfidfVectorizerwith:- Custom
max_dfandmin_dfthresholds - Max features set to 3000
- Custom
-
Model Type: Feedforward Neural Network using
tensorflow.keras -
Architecture:
- Dense hidden layers with ReLU activation
- Dropout regularization
- Final sigmoid layer for binary output
-
Trained on the TF-IDF transformed text data
-
Used binary crossentropy loss and SGD optimizer
-
Saved both the trained model and the vectorizer using:
model.save()from TensorFlowpickle.dump()for the TF-IDF vectorizer
- Accuracy, precision, recall, and F1-score calculated on test data
- Plotted the precision-recall curve to analyze classifier performance
- Python
- pandas, numpy
- scikit-learn (
TfidfVectorizer, metrics) - TensorFlow / Keras (modeling)
- matplotlib, seaborn (visualizations)
- pickle (persistence)
- Implemented a custom neural network for NLP tasks
- Tuned TF-IDF vectorizer parameters to improve model generalization
- Understood the importance of precision-recall trade-offs
- Learned to persist and reload models and vectorizers for deployment use cases
- Load the saved
TF-IDF vectorizerfrom the.pklfile - Load the neural network using
tensorflow.keras.models.load_model() - Pass new text data through the vectorizer, then predict with the model
This project helped me solidify my understanding of:
- NLP text vectorization techniques
- Neural network design and optimization
- Model persistence for real-world ML pipelines
- Evaluation strategies beyond accuracy