GOAL: Classify emails as spam or not-spam on the basis of the message.
DATASET: E-Mail classification NLP
WHAT I HAD DONE:
I have started with simple Exploratory Data Analysis(EDA), looked for some null or duplicate vales. Then splited the training and testing dataset using sklearn. Then I implemented different models to classify the message as spam or not-spam.
MODELS USED:
- Naive Bayes (Multinominal)
- Random Forest Classifier
- XG Boost Classifier
LIBRARIES NEEDED:
- Numpy
- Pandas
- Sklearn
CONCLUSION:
Models | Accuracy on Training Set | Accuracy on Testing Set |
---|---|---|
Naive Bayes (Multinominal) | 0.99346 | 0.96875 |
Random Forest Classifier | 1.0 | 0.91666 |
XG Boost Classifier | 0.97908 | 0.95833 |
Naive Bayes (Multinominal) Training Set Accuracy | Naive Bayes (Multinominal) Testing Set Accuracy |
---|---|
If you want to contact me, you can reach me through below handles.
© 2021 Saurav Mukherjee