Run the following commands to install other packages:
pip3 install -r requirements.txt
pip3 install -U spacy
Download the data from the Toxic Comment Classification Challenge webpage.
Navigate to the src folder.
Machine Learning models: Run the following commands (back to back):
python3 preprocessing.py
python3 models.py
fastText models: Run the fasttext notebook.
Deep Learning models: Run the deeplearning notebook. and deeplearning2 notebooks.
All vectorized n-grams, AUC-ROC summary dataframes, predictions and probabilities will be dumped in the pickle_objects/
folder.
Models and ROC curve plots will be dumped in the folders pickle_objects/models/
and plots/
(or pickle_objects/models_features/
and plots_features/
if you choose to use extra features -- see models.py).