This is the project repository for my 2024 Rustnation talk Creating a Text-To-Speech System in Rust. Here you can find the main TTS engine, slides and a collection of useful scripts and resources. Check out each modules documentation for more content I didn't have time to discuss in the talk!
For the other projects mentioned look no further:
You can also watch the talk on Youtube!
This repo uses git-lfs to store the neural networks, make sure this is setup before cloning or things may not work as expected.
I'm also using the dynamic library feature for ORT. You will need to download
the correct ORT version for your system
here
and set the ORT_DYLIB_PATH
env var to the path to libonnxruntime.so
.
Alternatively, if the ORT project downloads the correct version for your system
you can manually remove the feature.
There are two binaries in the project, one to prepare/analyse training data and another to run the TTS
For mac users you may want to refer to this issue for getting cBlas working. You will also have to open the onnxruntime dylib in Finder to allow you to bypass Gatekeeper checks for the file as it isn't signed.
The presentation slides! These are done using typst.
Scripts here are mainly for some dataset cleaning, and plotting scripts to generate images for the slides. There's also a folder inside called speedyspeech for an old and largely abandoned part of the project.
In the resources folder I've added a custom dictionary, this includes
tokens in the LJ Speech corpus which aren't present in CMU Dict. For this I've
used the data_cleaning.py
script in scripts and the gruut grapheme-to-phoneme
(g2p) models. If I'd had time to do my own g2p this would have also been pure
Rust.
For data_cleaning.py
you will need to download the librispeech lexicon
here
There is also disabled support for loading a pre-trained speedy speech model where we load it via candle/torch/tract. Unfortunately, due to ONNX support outside of ORT this ended up being abandoned. But the code should work for other ONNX models, or JIT traced torch models which work better for those dependencies.
Unfortunately, I can't pretend any of it is useful, but for someone considering using any of those crates these modules can be a pointer on how to start using them. I've also added a vast array of doc comments explaining some of the conversion process and difficulties I faced.