Skip to content

Commit

Permalink
Add doc about how to convert piper models to sherpa-onnx (#516)
Browse files Browse the repository at this point in the history
  • Loading branch information
csukuangfj authored Dec 13, 2023
1 parent 10444b9 commit 3d04d06
Show file tree
Hide file tree
Showing 5 changed files with 198 additions and 1 deletion.
Binary file added docs/source/_static/piper/test.wav
Binary file not shown.
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
# -- Project information -----------------------------------------------------

project = "sherpa"
copyright = "2022, sherpa development team"
copyright = "2022-2023, sherpa development team"
author = "sherpa development team"


Expand Down
68 changes: 68 additions & 0 deletions docs/source/onnx/tts/code/piper.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
#!/usr/bin/env python3

import json
import os
from typing import Any, Dict

import onnx


def add_meta_data(filename: str, meta_data: Dict[str, Any]):
"""Add meta data to an ONNX model. It is changed in-place.
Args:
filename:
Filename of the ONNX model to be changed.
meta_data:
Key-value pairs.
"""
model = onnx.load(filename)
for key, value in meta_data.items():
meta = model.metadata_props.add()
meta.key = key
meta.value = str(value)

onnx.save(model, filename)


def load_config(model):
with open(f"{model}.json", "r") as file:
config = json.load(file)
return config


def generate_tokens(config):
id_map = config["phoneme_id_map"]
with open("tokens.txt", "w", encoding="utf-8") as f:
for s, i in id_map.items():
f.write(f"{s} {i[0]}\n")
print("Generated tokens.txt")


def main():
# Caution: Please change the filename
filename = "en_US-amy-low.onnx"

# The rest of the file should not be changed.
# You only need to change the above filename = "xxx.onxx" in this file

config = load_config(filename)

print("generate tokens")
generate_tokens(config)

print("add model metadata")
meta_data = {
"model_type": "vits",
"comment": "piper", # must be piper for models from piper
"language": config["language"]["name_english"],
"voice": config["espeak"]["voice"], # e.g., en-us
"has_espeak": 1,
"n_speakers": config["num_speakers"],
"sample_rate": config["audio"]["sample_rate"],
}
print(meta_data)
add_meta_data(filename, meta_data)


main()
1 change: 1 addition & 0 deletions docs/source/onnx/tts/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,5 @@ to install `sherpa-onnx`_ before you continue.

./hf-space.rst
./pretrained_models/index
./piper
./faq
128 changes: 128 additions & 0 deletions docs/source/onnx/tts/piper.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
Piper
=====

In this section, we describe how to convert `piper`_ pre-trained models
from `<https://huggingface.co/rhasspy/piper-voices>`_.

.. hint::

You can find ``all`` of the converted models from `piper`_ in the following address:

`<https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models>`_

If you want to convert your own pre-trained `piper`_ models or if you want to
learn how the conversion works, please read on.

Otherwise, you only need to download the converted models from the above link.

Note that there are pre-trained models for over 30 languages from `piper`_. All models
share the same converting method, so we use an American English model in this
section as an example.

Install dependencies
--------------------

.. code-block:: bash
pip install onnx onnxruntime
.. hint::

We suggest that you always use the latest version of onnxruntime.

Find the pre-trained model from piper
-------------------------------------

All American English models from `piper`_ can be found at
`<https://huggingface.co/rhasspy/piper-voices/tree/main/en/en_US>`_.

We use `<https://huggingface.co/rhasspy/piper-voices/tree/main/en/en_US/amy/low>`_ as
an example in this section.

Download the pre-trained model
------------------------------

We need to download two files for each model:

.. code-block:: bash
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/amy/low/en_US-amy-low.onnx
wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/amy/low/en_US-amy-low.onnx.json
Add meta data to the onnx model
-------------------------------

Please use the following code to add meta data to the downloaded onnx model.

.. literalinclude:: ./code/piper.py
:language: python

After running the above script, your ``en_US-amy-low.onnx`` is updated with
meta data and it also generates a new file ``tokens.txt``.

From now on, you don't need the config json file ``en_US-amy-low.onnx.json`` any longer.

Download espeak-ng-data
-----------------------

.. code-block:: bash
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/espeak-ng-data.tar.bz2
tar xf espeak-ng-data.tar.bz2
Note that ``espeak-ng-data.tar.bz2`` is shared by all models from `piper`_, no matter
which language your are using for your model.

Test your converted model
-------------------------

To have a quick test of your converted model, you can use

.. code-block:: bash
pip install sherpa-onnx
to install `sherpa-onnx`_ and then use the following commands to test your model:

.. code-block:: bash
# The command "pip install sherpa-onnx" will install several binaries,
# including the following one
which sherpa-onnx-offline-tts
sherpa-onnx-offline-tts \
--vits-model=./en_US-amy-low.onnx \
--vits-tokens=./tokens.txt \
--vits-data-dir=./espeak-ng-data \
--output-filename=./test.wav \
"How are you doing? This is a text-to-speech application using next generation Kaldi."
The above command should generate a wave file ``test.wav``.

.. raw:: html

<table>
<tr>
<th>Wave filename</th>
<th>Content</th>
<th>Text</th>
</tr>
<tr>
<td>test.wav</td>
<td>
<audio title="Generated ./test.wav" controls="controls">
<source src="/sherpa/_static/piper/test.wav" type="audio/wav">
Your browser does not support the <code>audio</code> element.
</audio>
</td>
<td>
How are you doing? This is a text-to-speech application using next generation Kaldi.
</td>
</tr>
</table>


Congratulations! You have successfully converted a model from `piper`_ and run it with `sherpa-onnx`_.


0 comments on commit 3d04d06

Please sign in to comment.