Skip to content

myurasov/Kaggle-Toxic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

73 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Code for Toxic Comment Classification Challenge

https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge/

Setup

Download Dataset

$ docker/docker.sh "src/download_dataset.sh"

kaggle.json with valid API token needs to be placed in the application directory prior to downloading data or uploading submissions.

Download Pre-trained BERT

$ docker/docker.sh "src/download_bert.sh"

Prepare Dataset for BERT

$ docker/docker.sh "src/preprocess_for_bert.py"

Train BERT-based Classifier

$ docker/docker.sh "src/train_bert.py [arguments]"

Options available:

-h, --help
--run RUN
--max_items MAX_ITEMS
--epochs EPOCHS
--batch BATCH
--lr_start LR_START
--val_split VAL_SPLIT
--early_stop_patience EARLY_STOP_PATIENCE
--samples_per_epoch SAMPLES_PER_EPOCH

To run with Horovod:

$ docker/docker.sh --gpus='"device=0,1,2,###"' "horovodrun -np ### src/train_bert_hvd.py"

Generating submission with BERT-based Classifier

$ docker/docker.sh "src/infer_bert.py"

Options available:

-h, --help
--model_file MODEL_FILE
--submission_file SUBMISSION_FILE
--batch BATCH
--max_items MAX_ITEMS

Starting Jupyter Lab and TensorBoard

$ docker/docker-forever.sh [--jupyter_port=####|8888] [--tensorboard_port=####|6006]

@see

About

Experiments with Kaggle "Toxic Comment Classification Challenge" (https://www.kaggle.com/c/jigsaw-toxic-comment-classification-challenge)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published