text_normalization

Clone this repo: git clone https://github.com/maximxlss/text_normalization
cd text_normalization
Install requirements: pip install -r requirements.txt
Install PyTorch
Download ru_train.csv from this Kaggle challenge
Run python preprocess.py (takes time)
Run python train_tokenizer.py (also takes time)
Tweak settings in train.py
Run python train.py
I have reset the scheduler (see train.py) manually when training so keep that in mind. You can see the details of the training process in the metrics

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
preprocess.py		preprocess.py
requirements.txt		requirements.txt
ru_train_extras.csv		ru_train_extras.csv
test.py		test.py
train.py		train.py
train_tokenizer.py		train_tokenizer.py

Provide feedback