Part of my works of 2 weeks' challenge for CommonLit Readability Prize Competition
Here I use roberta-base but there are several options for the models. Ensembling models worked well for this competition.
- distilroberta-base
- robera-large
- albert-base-v2 ※ Didn't get good score for this competition.
etc...
- https://www.kaggle.com/maunish/clrp-pytorch-roberta-pretrain
- https://www.kaggle.com/maunish/clrp-pytorch-roberta-finetune
Setting docker-compose.yml
as follows may be an easy way to build and run the container.
version: '3.8'
services:
clrp:
build:
context: clrp
dockerfile: Dockerfile
container_name: clrp
user: root
environment:
NVIDIA_VISIBLE_DEVICES: all
ports:
- "8889:8888"
tty: true
volumes:
- ./clrp:/clrp
At the top of the project's directory, create data
folder.
cd .
mkdir data
and place datasets for the competition.
data
├── sample_submission.csv
├── test.csv
└── train.csv
You can easily download datasets using Kaggle API.
kaggle competitions download -c commonlitreadabilityprize