Add doc about how to convert piper models to sherpa-onnx (#516)

k2-fsa · Dec 13, 2023 · 3d04d06 · 3d04d06
1 parent 10444b9
commit 3d04d06
Show file tree

Hide file tree

Showing 5 changed files with 198 additions and 1 deletion.
diff --git a/docs/source/_static/piper/test.wav b/docs/source/_static/piper/test.wav
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -21,7 +21,7 @@
 # -- Project information -----------------------------------------------------
 
 project = "sherpa"
-copyright = "2022, sherpa development team"
+copyright = "2022-2023, sherpa development team"
 author = "sherpa development team"
 
 

diff --git a/docs/source/onnx/tts/code/piper.py b/docs/source/onnx/tts/code/piper.py
@@ -0,0 +1,68 @@
+#!/usr/bin/env python3
+
+import json
+import os
+from typing import Any, Dict
+
+import onnx
+
+
+def add_meta_data(filename: str, meta_data: Dict[str, Any]):
+    """Add meta data to an ONNX model. It is changed in-place.
+
+    Args:
+      filename:
+        Filename of the ONNX model to be changed.
+      meta_data:
+        Key-value pairs.
+    """
+    model = onnx.load(filename)
+    for key, value in meta_data.items():
+        meta = model.metadata_props.add()
+        meta.key = key
+        meta.value = str(value)
+
+    onnx.save(model, filename)
+
+
+def load_config(model):
+    with open(f"{model}.json", "r") as file:
+        config = json.load(file)
+    return config
+
+
+def generate_tokens(config):
+    id_map = config["phoneme_id_map"]
+    with open("tokens.txt", "w", encoding="utf-8") as f:
+        for s, i in id_map.items():
+            f.write(f"{s} {i[0]}\n")
+    print("Generated tokens.txt")
+
+
+def main():
+    # Caution: Please change the filename
+    filename = "en_US-amy-low.onnx"
+
+    # The rest of the file should not be changed.
+    # You only need to change the above filename = "xxx.onxx" in this file
+
+    config = load_config(filename)
+
+    print("generate tokens")
+    generate_tokens(config)
+
+    print("add model metadata")
+    meta_data = {
+        "model_type": "vits",
+        "comment": "piper",  # must be piper for models from piper
+        "language": config["language"]["name_english"],
+        "voice": config["espeak"]["voice"],  # e.g., en-us
+        "has_espeak": 1,
+        "n_speakers": config["num_speakers"],
+        "sample_rate": config["audio"]["sample_rate"],
+    }
+    print(meta_data)
+    add_meta_data(filename, meta_data)
+
+
+main()
diff --git a/docs/source/onnx/tts/index.rst b/docs/source/onnx/tts/index.rst
@@ -12,4 +12,5 @@ to install `sherpa-onnx`_ before you continue.
 
    ./hf-space.rst
    ./pretrained_models/index
+   ./piper
    ./faq
diff --git a/docs/source/onnx/tts/piper.rst b/docs/source/onnx/tts/piper.rst
@@ -0,0 +1,128 @@
+Piper
+=====
+
+In this section, we describe how to convert `piper`_ pre-trained models
+from `<https://huggingface.co/rhasspy/piper-voices>`_.
+
+.. hint::
+
+   You can find ``all`` of the converted models from `piper`_ in the following address:
+
+    `<https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models>`_
+
+  If you want to convert your own pre-trained `piper`_ models or if you want to
+  learn how the conversion works, please read on.
+
+  Otherwise, you only need to download the converted models from the above link.
+
+Note that there are pre-trained models for over 30 languages from `piper`_. All models
+share the same converting method, so we use an American English model in this
+section as an example.
+
+Install dependencies
+--------------------
+
+.. code-block:: bash
+
+   pip install onnx onnxruntime
+
+.. hint::
+
+   We suggest that you always use the latest version of onnxruntime.
+
+Find the pre-trained model from piper
+-------------------------------------
+
+All American English models from `piper`_ can be found at
+`<https://huggingface.co/rhasspy/piper-voices/tree/main/en/en_US>`_.
+
+We use `<https://huggingface.co/rhasspy/piper-voices/tree/main/en/en_US/amy/low>`_ as
+an example in this section.
+
+Download the pre-trained model
+------------------------------
+
+We need to download two files for each model:
+
+.. code-block:: bash
+
+   wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/amy/low/en_US-amy-low.onnx
+   wget https://huggingface.co/rhasspy/piper-voices/resolve/main/en/en_US/amy/low/en_US-amy-low.onnx.json
+
+Add meta data to the onnx model
+-------------------------------
+
+Please use the following code to add meta data to the downloaded onnx model.
+
+.. literalinclude:: ./code/piper.py
+   :language: python
+
+After running the above script, your ``en_US-amy-low.onnx`` is updated with
+meta data and it also generates a new file ``tokens.txt``.
+
+From now on, you don't need the config json file ``en_US-amy-low.onnx.json`` any longer.
+
+Download espeak-ng-data
+-----------------------
+
+.. code-block:: bash
+
+   wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/espeak-ng-data.tar.bz2
+   tar xf espeak-ng-data.tar.bz2
+
+Note that ``espeak-ng-data.tar.bz2`` is shared by all models from `piper`_, no matter
+which language your are using for your model.
+
+Test your converted model
+-------------------------
+
+To have a quick test of your converted model, you can use
+
+.. code-block:: bash
+
+   pip install sherpa-onnx
+
+to install `sherpa-onnx`_ and then use the following commands to test your model:
+
+.. code-block:: bash
+
+   # The command "pip install sherpa-onnx" will install several binaries,
+   # including the following one
+
+   which sherpa-onnx-offline-tts
+
+   sherpa-onnx-offline-tts \
+     --vits-model=./en_US-amy-low.onnx \
+     --vits-tokens=./tokens.txt \
+     --vits-data-dir=./espeak-ng-data \
+     --output-filename=./test.wav \
+     "How are you doing? This is a text-to-speech application using next generation Kaldi."
+
+The above command should generate a wave file ``test.wav``.
+
+.. raw:: html
+
+  <table>
+    <tr>
+      <th>Wave filename</th>
+      <th>Content</th>
+      <th>Text</th>
+    </tr>
+    <tr>
+      <td>test.wav</td>
+      <td>
+       <audio title="Generated ./test.wav" controls="controls">
+             <source src="/sherpa/_static/piper/test.wav" type="audio/wav">
+             Your browser does not support the <code>audio</code> element.
+       </audio>
+      </td>
+      <td>
+        How are you doing? This is a text-to-speech application using next generation Kaldi.
+      </td>
+    </tr>
+  </table>
+
+
+Congratulations! You have successfully converted a model from `piper`_ and run it with `sherpa-onnx`_.
+
+
-Original file line number
+Diff line change
@@ Expand Up / @@ -12,4 +12,5 @@ to install `sherpa-onnx`_ before you continue. @@
        ./hf-space.rst
        ./pretrained_models/index
+       ./piper
        ./faq