forked from NVIDIA/NeMo
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* refactoring text normalization docs and tutorial Signed-off-by: Yang Zhang <[email protected]> * rename nemo tools to nemo text processing Signed-off-by: Yang Zhang <[email protected]> * rename nemo_tools to nemo_text_processing Signed-off-by: Yang Zhang <[email protected]> * rename docs Signed-off-by: Yang Zhang <[email protected]> * rename files Signed-off-by: Yang Zhang <[email protected]> * rename pytests Signed-off-by: Yang Zhang <[email protected]> * fix pytest Signed-off-by: Yang Zhang <[email protected]> * fix refactoring Signed-off-by: Yang Zhang <[email protected]> * renamed functions in ITN tutorial Signed-off-by: Yang Zhang <[email protected]> * fix typo Signed-off-by: Yang Zhang <[email protected]> * fix path to tutorial in readme for ITN Signed-off-by: Yang Zhang <[email protected]> * fix Jenkins Signed-off-by: Yang Zhang <[email protected]> * fix Jenkins Signed-off-by: Yang Zhang <[email protected]> Co-authored-by: Oleksii Kuchaiev <[email protected]>
- Loading branch information
Showing
135 changed files
with
378 additions
and
313 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
`nemo_text_processing` is a python package that is installed with the `nemo_toolkit`. | ||
|
||
See :doc:`NeMo Introduction <../starthere/intro>` for installation details. | ||
|
||
.. toctree:: | ||
:maxdepth: 1 | ||
|
||
text_normalization | ||
inverse_text_normalization | ||
|
||
|
27 changes: 27 additions & 0 deletions
27
docs/source/nemo_text_processing/inverse_text_normalization.rst
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
Inverse Text Normalization | ||
========================== | ||
|
||
Inverse text normalization (ITN), also called denormalization, is a part of the Automatic Speech Recognition (ASR) post-processing pipeline. | ||
ITN is the task of converting the raw spoken output of the ASR model into its written form to improve text readability. | ||
|
||
For example, | ||
`"in nineteen seventy"` -> `"in 1975"` | ||
and `"one hundred and twenty three dollars"` -> `"$123"`. | ||
|
||
This tool is based on WFST-grammars :cite:`tools-itn-mohri2009`. We also provide a deployment route to C++ using Sparrowhawk -- an open-source version of Google Kestrel :cite:`tools-itn-ebden2015kestrel`. | ||
See :doc:`ITN Deployment <../tools/inverse_text_normalization_deployment>` for details. | ||
|
||
.. note:: | ||
|
||
For more details, see the tutorial `NeMo/tutorials/tools/Inverse_Text_Normalization.ipynb <https://github.com/NVIDIA/NeMo/blob/main/tutorials/tools/Inverse_Text_Normalization.ipynb>`__. | ||
|
||
|
||
|
||
|
||
References | ||
---------- | ||
|
||
.. bibliography:: tools_all.bib | ||
:style: plain | ||
:labelprefix: TOOLS-ITN | ||
:keyprefix: tools-itn- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
Text Normalization | ||
================== | ||
|
||
This tool converts text from written form into its verbalized form, including numbers and dates, `10:00` -> `ten o'clock`, `10kg` -> `ten kilograms`. | ||
Text normalization is used as a preprocessing step before Text to Speech (TTS). It could also be used for preprocessing Automatic Speech Recognition (ASR) training transcripts. | ||
|
||
.. note:: | ||
|
||
We recommend you try the tutorial `NeMo/tutorials/tools/Text_Normalization_Tutorial.ipynb <https://github.com/NVIDIA/NeMo/blob/main/tutorials/tools/Text_Normalization_Tutorial.ipynb>`__. | ||
|
||
|
||
Prediction | ||
---------------------------------- | ||
|
||
Example prediction run: | ||
|
||
.. code:: | ||
python run_prediction.py --input=<INPUT_TEXT_FILE> --output=<OUTPUT_PATH> | ||
Evaluation | ||
---------------------------------- | ||
|
||
Example evaluation run: | ||
|
||
.. code:: | ||
python run_evaluation.py --input=./en_with_types/output-00001-of-00100 [--cat CLASS_CATEGORY] | ||
References | ||
---------- | ||
|
||
.. bibliography:: tools_all.bib | ||
:style: plain | ||
:labelprefix: TOOLS-NORM | ||
:keyprefix: tools-norm- | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
@inproceedings{kurzinger2020ctc, | ||
title={CTC-segmentation of large corpora for german end-to-end speech recognition}, | ||
author={K{\"u}rzinger, Ludwig and Winkelbauer, Dominik and Li, Lujun and Watzel, Tobias and Rigoll, Gerhard}, | ||
booktitle={International Conference on Speech and Computer}, | ||
pages={267--278}, | ||
year={2020}, | ||
organization={Springer} | ||
} | ||
|
||
@article{ebden2015kestrel, | ||
title={The Kestrel TTS text normalization system}, | ||
author={Ebden, Peter and Sproat, Richard}, | ||
journal={Natural Language Engineering}, | ||
volume={21}, | ||
number={3}, | ||
pages={333}, | ||
year={2015}, | ||
publisher={Cambridge University Press} | ||
} | ||
|
||
@article{sproat2016rnn, | ||
title={RNN approaches to text normalization: A challenge}, | ||
author={Sproat, Richard and Jaitly, Navdeep}, | ||
journal={arXiv preprint arXiv:1611.00068}, | ||
year={2016} | ||
} | ||
|
||
@book{taylor2009text, | ||
title={Text-to-speech synthesis}, | ||
author={Taylor, Paul}, | ||
year={2009}, | ||
publisher={Cambridge university press} | ||
} | ||
|
||
|
||
@Inbook{mohri2009, | ||
author="Mohri, Mehryar", | ||
editor="Droste, Manfred | ||
and Kuich, Werner | ||
and Vogler, Heiko", | ||
title="Weighted Automata Algorithms", | ||
bookTitle="Handbook of Weighted Automata", | ||
year="2009", | ||
publisher="Springer Berlin Heidelberg", | ||
address="Berlin, Heidelberg", | ||
pages="213--254", | ||
abstract="Weighted automata and transducers are widely used in modern applications in bioinformatics and text, speech, and image processing. This chapter describes several fundamental weighted automata and shortest-distance algorithms including composition, determinization, minimization, and synchronization, as well as single-source and all-pairs shortest distance algorithms over general semirings. It presents the pseudocode of these algorithms, gives an analysis of their running time complexity, and illustrates their use in some simple cases. Many other complex weighted automata and transducer algorithms used in practice can be obtained by combining these core algorithms.", | ||
isbn="978-3-642-01492-5", | ||
doi="10.1007/978-3-642-01492-5_6", | ||
url="https://doi.org/10.1007/978-3-642-01492-5_6" | ||
} |
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
**nemo_text_processing** | ||
========================== | ||
|
||
Introduction | ||
------------ | ||
|
||
NeMo's `nemo_text_processing` is a Python package that is installed with the `nemo_toolkit`. | ||
See `documentation <https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/>`_ for details. |
File renamed without changes.
Oops, something went wrong.