- More info in the model card: https://huggingface.co/maximxls/text-normalization-ru-terrible
- Clone this repo:
git clone https://github.com/maximxlss/text_normalization cd text_normalization- Install requirements:
pip install -r requirements.txt - Install PyTorch
- Download
ru_train.csvfrom this Kaggle challenge - Run
python preprocess.py(takes time) - Run
python train_tokenizer.py(also takes time) - Tweak settings in
train.py - Run
python train.py - I have reset the scheduler (see
train.py) manually when training so keep that in mind. You can see the details of the training process in the metrics