SentencePieceTrainer is cumbersome to call, and not available in all the environments we would like to run our testing.
We should remove the usage, either by
- Hard coding "text proto" sentencepiece proto objects in our modeling test files (can you do this in python?)
- Or saving the small proto files we need for testing in
tests/test_data and passing the path directly when running modeling tests.