Text processing docs (NVIDIA#2046)

* refactoring text normalization docs and tutorial Signed-off-by: Yang Zhang <[email protected]> * rename nemo tools to nemo text processing Signed-off-by: Yang Zhang <[email protected]> * rename nemo_tools to nemo_text_processing Signed-off-by: Yang Zhang <[email protected]> * rename docs Signed-off-by: Yang Zhang <[email protected]> * rename files Signed-off-by: Yang Zhang <[email protected]> * rename pytests Signed-off-by: Yang Zhang <[email protected]> * fix pytest Signed-off-by: Yang Zhang <[email protected]> * fix refactoring Signed-off-by: Yang Zhang <[email protected]> * renamed functions in ITN tutorial Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix path to tutorial in readme for ITN Signed-off-by: Yang Zhang <[email protected]> * fix Jenkins Signed-off-by: Yang Zhang <[email protected]> * fix Jenkins Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]>
DanTremonti · Apr 10, 2021 · 5212181 · 5212181
1 parent d4c0811
commit 5212181
Show file tree

Hide file tree

Showing 135 changed files with 378 additions and 313 deletions.
diff --git a/Dockerfile b/Dockerfile
@@ -85,9 +85,9 @@ COPY requirements .
 RUN for f in $(ls requirements*.txt); do pip install --disable-pip-version-check --no-cache-dir -r $f; done
 
 
-# install nemo_tools dependencies
-COPY nemo_tools/setup.sh nemo_tools_setup.sh
-RUN bash nemo_tools_setup.sh
+# install nemo_text_processing dependencies
+COPY nemo_text_processing/setup.sh nemo_text_processing_setup.sh
+RUN bash nemo_text_processing.sh
 
 #install TRT tools: PT quantization support and ONNX graph optimizer
 WORKDIR /tmp/trt_build
@@ -114,7 +114,8 @@ RUN --mount=from=nemo-src,target=/tmp/nemo cd /tmp/nemo && pip install ".[all]"
     python -c "import nemo.collections.asr as nemo_asr" && \
     python -c "import nemo.collections.nlp as nemo_nlp" && \
     python -c "import nemo.collections.tts as nemo_tts" && \
-    python -c "import nemo_tools.text_normalization as text_normalization"
+    python -c "import nemo_text_processing.text_normalization as text_normalization"
+
 # TODO: Remove once 21.04 container is base container
 # install latest numba version
 RUN conda update -c numba numba -y

diff --git a/Jenkinsfile b/Jenkinsfile
@@ -135,7 +135,7 @@ pipeline {
     //   }
     // }
 
-    stage('L2: NeMo tools') {
+    stage('L2: NeMo text processing') {
       when {
         anyOf {
           branch 'main'
@@ -146,10 +146,10 @@ pipeline {
       parallel {
         stage('L2: pynini export') {
           steps {
-            sh 'cd tools/text_denormalization && python pynini_export.py /home/TestData/nlp/text_denorm/output/ && ls -R /home/TestData/nlp/text_denorm/output/ && echo ".far files created "|| exit 1'
-            sh 'cd tools/text_denormalization && cp *.grm /home/TestData/nlp/text_denorm/output/'
+            sh 'cd tools/inverse_text_normalization_deployment && python pynini_export.py /home/TestData/nlp/text_denorm/output/ && ls -R /home/TestData/nlp/text_denorm/output/ && echo ".far files created "|| exit 1'
+            sh 'cd tools/inverse_text_normalization_deployment && cp *.grm /home/TestData/nlp/text_denorm/output/'
             sh 'ls -R /home/TestData/nlp/text_denorm/output/'
-            sh 'cd nemo_tools/text_denormalization/ &&  python run_predict.py --input=/home/TestData/nlp/text_denorm/ci/test.txt --output=/home/TestData/nlp/text_denorm/output/test.pynini.txt --verbose'
+            sh 'cd nemo_text_processing/inverse_text_normalization/ &&  python run_predict.py --input=/home/TestData/nlp/text_denorm/ci/test.txt --output=/home/TestData/nlp/text_denorm/output/test.pynini.txt --verbose'
             sh 'cmp --silent /home/TestData/nlp/text_denorm/output/test.pynini.txt /home/TestData/nlp/text_denorm/ci/test_goal_py.txt || exit 1'
             sh 'rm -rf /home/TestData/nlp/text_denorm/output/*'
           }

diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -51,10 +51,10 @@ NVIDIA NeMo User Guide
 
 .. toctree::
    :maxdepth: 2
-   :caption: NeMo-Tools Package
-   :name: NeMo-Tools Package
+   :caption: NeMo-Text-Processing Package
+   :name: NeMo-Text-Processing Package
 
-   nemo_tools/intro
+   nemo_text_processing/intro
 
 .. toctree::
    :maxdepth: 2

diff --git a/docs/source/nemo_text_processing/intro.rst b/docs/source/nemo_text_processing/intro.rst
@@ -0,0 +1,11 @@
+`nemo_text_processing` is a python package that is installed with the `nemo_toolkit`.
+
+See :doc:`NeMo Introduction <../starthere/intro>` for installation details.
+
+.. toctree::
+   :maxdepth: 1
+
+   text_normalization
+   inverse_text_normalization
+
+
diff --git a/docs/source/nemo_text_processing/inverse_text_normalization.rst b/docs/source/nemo_text_processing/inverse_text_normalization.rst
@@ -0,0 +1,27 @@
+Inverse Text Normalization
+==========================
+
+Inverse text normalization (ITN), also called denormalization, is a part of the Automatic Speech Recognition (ASR) post-processing pipeline.
+ITN is the task of converting the raw spoken output of the ASR model into its written form to improve text readability.
+
+For example, 
+`"in nineteen seventy"` -> `"in 1975"` 
+and `"one hundred and twenty three dollars"` -> `"$123"`.
+
+This tool is based on WFST-grammars :cite:`tools-itn-mohri2009`. We also provide a deployment route to C++ using Sparrowhawk -- an open-source version of Google Kestrel :cite:`tools-itn-ebden2015kestrel`.
+See :doc:`ITN Deployment <../tools/inverse_text_normalization_deployment>` for details.
+
+.. note::
+
+    For more details, see the tutorial `NeMo/tutorials/tools/Inverse_Text_Normalization.ipynb <https://github.com/NVIDIA/NeMo/blob/main/tutorials/tools/Inverse_Text_Normalization.ipynb>`__.
+
+
+
+
+References
+----------
+
+.. bibliography:: tools_all.bib
+    :style: plain
+    :labelprefix: TOOLS-ITN
+    :keyprefix: tools-itn-
diff --git a/docs/source/nemo_text_processing/text_normalization.rst b/docs/source/nemo_text_processing/text_normalization.rst
@@ -0,0 +1,40 @@
+Text Normalization
+==================
+
+This tool converts text from written form into its verbalized form, including numbers and dates, `10:00` -> `ten o'clock`, `10kg` -> `ten kilograms`.
+Text normalization is used as a preprocessing step before Text to Speech (TTS). It could also be used for preprocessing Automatic Speech Recognition (ASR) training transcripts.
+
+.. note::
+
+    We recommend you try the tutorial `NeMo/tutorials/tools/Text_Normalization_Tutorial.ipynb <https://github.com/NVIDIA/NeMo/blob/main/tutorials/tools/Text_Normalization_Tutorial.ipynb>`__.
+
+
+Prediction
+----------------------------------
+
+Example prediction run:
+
+.. code::
+
+    python run_prediction.py  --input=<INPUT_TEXT_FILE> --output=<OUTPUT_PATH>
+
+
+Evaluation
+----------------------------------
+
+Example evaluation run:
+
+.. code::
+
+    python run_evaluation.py  --input=./en_with_types/output-00001-of-00100 [--cat CLASS_CATEGORY]
+
+
+References
+----------
+
+.. bibliography:: tools_all.bib
+    :style: plain
+    :labelprefix: TOOLS-NORM
+    :keyprefix: tools-norm-
+
+
diff --git a/docs/source/nemo_text_processing/tools_all.bib b/docs/source/nemo_text_processing/tools_all.bib
@@ -0,0 +1,51 @@
+@inproceedings{kurzinger2020ctc,
+  title={CTC-segmentation of large corpora for german end-to-end speech recognition},
+  author={K{\"u}rzinger, Ludwig and Winkelbauer, Dominik and Li, Lujun and Watzel, Tobias and Rigoll, Gerhard},
+  booktitle={International Conference on Speech and Computer},
+  pages={267--278},
+  year={2020},
+  organization={Springer}
+}
+
+@article{ebden2015kestrel,
+  title={The Kestrel TTS text normalization system},
+  author={Ebden, Peter and Sproat, Richard},
+  journal={Natural Language Engineering},
+  volume={21},
+  number={3},
+  pages={333},
+  year={2015},
+  publisher={Cambridge University Press}
+}
+
+@article{sproat2016rnn,
+  title={RNN approaches to text normalization: A challenge},
+  author={Sproat, Richard and Jaitly, Navdeep},
+  journal={arXiv preprint arXiv:1611.00068},
+  year={2016}
+}
+
+@book{taylor2009text,
+  title={Text-to-speech synthesis},
+  author={Taylor, Paul},
+  year={2009},
+  publisher={Cambridge university press}
+}
+
+
+@Inbook{mohri2009,
+author="Mohri, Mehryar",
+editor="Droste, Manfred
+and Kuich, Werner
+and Vogler, Heiko",
+title="Weighted Automata Algorithms",
+bookTitle="Handbook of Weighted Automata",
+year="2009",
+publisher="Springer Berlin Heidelberg",
+address="Berlin, Heidelberg",
+pages="213--254",
+abstract="Weighted automata and transducers are widely used in modern applications in bioinformatics and text, speech, and image processing. This chapter describes several fundamental weighted automata and shortest-distance algorithms including composition, determinization, minimization, and synchronization, as well as single-source and all-pairs shortest distance algorithms over general semirings. It presents the pseudocode of these algorithms, gives an analysis of their running time complexity, and illustrates their use in some simple cases. Many other complex weighted automata and transducer algorithms used in practice can be obtained by combining these core algorithms.",
+isbn="978-3-642-01492-5",
+doi="10.1007/978-3-642-01492-5_6",
+url="https://doi.org/10.1007/978-3-642-01492-5_6"
+}
diff --git a/docs/source/nemo_tools/intro.rst b/docs/source/nemo_tools/intro.rst
diff --git a/docs/source/nemo_tools/text_denormalization.rst b/docs/source/nemo_tools/text_denormalization.rst
diff --git a/docs/source/nemo_tools/text_normalization.rst b/docs/source/nemo_tools/text_normalization.rst
diff --git a/docs/source/nemo_tools/tools_all.bib b/docs/source/nemo_tools/tools_all.bib
diff --git a/docs/source/starthere/tutorials.rst b/docs/source/starthere/tutorials.rst
@@ -107,8 +107,8 @@ To run tutorials:
      - CTC Segmentation
      - `CTC Segmentation <https://colab.research.google.com/github/NVIDIA/NeMo/blob/r1.0.0rc1/tutorials/tools/CTC_Segmentation_Tutorial.ipynb>`_
    * - Tools
-     - Text Normalization for Text To Speech
+     - Text Normalization for TTS
      - `Text Normalization <https://colab.research.google.com/github/NVIDIA/NeMo/blob/r1.0.0rc1/tutorials/tools/Text_Normalization_Tutorial.ipynb>`_
    * - Tools
-     - Text Denormalization for Text To Speech
-     - `Text Denormalization(Inverse_Text_Normalization) <https://colab.research.google.com/github/NVIDIA/NeMo/blob/r1.0.0rc1/tutorials/tools/Text_Denormalization_(Inverse_Text_Normalization).ipynb>`_
+     - Inverse Text Normalization for ASR
+     - `Inverse Text Normalization <https://colab.research.google.com/github/NVIDIA/NeMo/blob/main/tutorials/tools/Inverse_Text_Normalization.ipynb>`_
diff --git a/docs/source/tools/intro.rst b/docs/source/tools/intro.rst
@@ -9,5 +9,6 @@ NeMo provides a set of tools useful for developing Automatic Speech Recognitions
 
    ctc_segmentation
    speech_data_explorer
+   inverse_text_normalization_deployment
 
 
diff --git a/tools/text_denormalization/README.rst → ...inverse_text_normalization_deployment.rst b/tools/text_denormalization/README.rst → ...inverse_text_normalization_deployment.rst
@@ -1,19 +1,15 @@
-**NeMo/tools/text_denormalization**
-=========================================
-
-Introduction
-------------
-
-This folder provides scripts to deploy WFST-based grammars in `nemo_tools <https://github.com/NVIDIA/NeMo/blob/main/nemo_tools>`_ for
-Inverse Text Normalization system in production.
+Inverse Text Normalization Deployment
+===============================================
 
+This tool deploys :doc:`NeMo Inverse Text Normalization <../nemo_text_processing/inverse_text_normalization>` for production.
+It uses Sparrowhawk -- an open-source version of Google Kestrel :cite:`tools-itn-ebden2015kestrel`.
 
 Requirements
 ------------------------
 
-1) nemo_tools
+1) :doc:`nemo_text_processing <../nemo_text_processing/intro>` package
 
-See NeMo `README <https://github.com/NVIDIA/NeMo/blob/main/README.rst>`_ for installation guide.
+See :doc:`NeMo Introduction <../starthere/intro>` for installation details.
 
 
 Usage
@@ -27,7 +23,7 @@ Automatically start docker container with production backend with plugged in gra
 
 This script runs the following steps in sequence:
 
-Exports grammars `tokenize_and_classify_tmp.far` and `verbalize_tmp.far` from nemo_tools to directory `classify/` and `verbalize/` respectively
+Exports grammars `tokenize_and_classify_tmp.far` and `verbalize_tmp.far` from `nemo_text_processing` to directory `classify/` and `verbalize/` respectively
 
 .. code-block:: bash
 
@@ -61,4 +57,12 @@ Runs Inverse Text Normalization:
 
     echo "two dollars fifty" | ../../src/bin/normalizer_main --config=sparrowhawk_configuration.ascii_proto
 
-This returns $2.50.
+This returns $2.50.
+
+References
+----------
+
+.. bibliography:: tools_all.bib
+    :style: plain
+    :labelprefix: TOOLS-ITN
+    :keyprefix: tools-itn-
diff --git a/nemo_text_processing/README.rst b/nemo_text_processing/README.rst
@@ -0,0 +1,8 @@
+**nemo_text_processing**
+==========================
+
+Introduction
+------------
+
+NeMo's `nemo_text_processing` is a Python package that is installed with the `nemo_toolkit`. 
+See `documentation <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/>`_ for details.
diff --git a/nemo_tools/__init__.py → nemo_text_processing/__init__.py b/nemo_tools/__init__.py → nemo_text_processing/__init__.py
Original file line number	Diff line number	Diff line change
Expand Up		@@ -9,5 +9,6 @@ NeMo provides a set of tools useful for developing Automatic Speech Recognitions

		ctc_segmentation
		speech_data_explorer
		inverse_text_normalization_deployment