Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix broken links r1.18.0 #6501

Merged
merged 2 commits into from
Apr 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions tutorials/tools/CTC_Segmentation_Tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -280,7 +280,7 @@
"* `max_length` argument - max number of words in a segment for alignment (used only if there are no punctuation marks present in the original text. Long non-speech segments are better for segments split and are more likely to co-occur with punctuation marks. Random text split could deteriorate the quality of the alignment.\n",
"* out-of-vocabulary words will be removed based on pre-trained ASR model vocabulary, and the text will be changed to lowercase \n",
"* sentences for alignment with the original punctuation and capitalization will be stored under `$OUTPUT_DIR/processed/*_with_punct.txt`\n",
"* numbers will be converted from written to their spoken form with `num2words` package. For English, it's recommended to use NeMo normalization tool use `--use_nemo_normalization` argument (not supported if running this segmentation tutorial in Colab, see the text normalization tutorial: [`tutorials/text_processing/Text_Normalization.ipynb`](https://colab.research.google.com/github/NVIDIA/NeMo/blob/stable/tutorials/text_processing/Text_Normalization.ipynb) for more details). Even `num2words` normalization is usually enough for proper segmentation. However, it does not take audio into account. NeMo supports audio-based normalization for English, German and Russian languages that can be applied to the segmented data as a post-processing step. Audio-based normalization produces multiple normalization options. For example, `901` could be normalized as `nine zero one` or `nine hundred and one`. The audio-based normalization chooses the best match among the possible normalization options and the transcript based on the character error rate. Note, the audio-based normalization of long audio samples is not supported due to multiple normalization options. See [NeMo/nemo_text_processing/text_normalization/normalize_with_audio.py](https://github.com/NVIDIA/NeMo/blob/stable/nemo_text_processing/text_normalization/normalize_with_audio.py) for more details.\n",
"* numbers will be converted from written to their spoken form with `num2words` package. For English, it's recommended to use NeMo normalization tool use `--use_nemo_normalization` argument (not supported if running this segmentation tutorial in Colab, see the text normalization tutorial: [`https://github.com/NVIDIA/NeMo-text-processing/blob/main/tutorials/Text_(Inverse)_Normalization.ipynb`](https://colab.research.google.com/github/NVIDIA/NeMo-text-processing/blob/main/tutorials/Text_(Inverse)_Normalization.ipynb) for more details). Even `num2words` normalization is usually enough for proper segmentation. However, it does not take audio into account. NeMo supports audio-based normalization for English, German and Russian languages that can be applied to the segmented data as a post-processing step. Audio-based normalization produces multiple normalization options. For example, `901` could be normalized as `nine zero one` or `nine hundred and one`. The audio-based normalization chooses the best match among the possible normalization options and the transcript based on the character error rate. See [https://github.com/NVIDIA/NeMo-text-processing/blob/main/nemo_text_processing/text_normalization/normalize_with_audio.py](https://github.com/NVIDIA/NeMo-text-processing/blob/main/nemo_text_processing/text_normalization/normalize_with_audio.py) for more details.\n",
"\n",
"### Audio preprocessing:\n",
"* non '.wav' audio files will be converted to `.wav` format\n",
Expand Down Expand Up @@ -714,4 +714,4 @@
]
}
]
}
}
2 changes: 1 addition & 1 deletion tutorials/tts/Pronunciation_customization.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@
"* *[heteronyms](https://en.wikipedia.org/wiki/Heteronym_(linguistics))* - words with the same spelling but different pronunciations and/or meanings, e.g., *bass* (the fish) and *bass* (the musical instrument).\n",
"\n",
"#### Important NeMo flags:\n",
"* `your_spec_generator_model.vocab.g2p.phoneme_dict` - phoneme dictionary that maps words to their phonetic transcriptions, e.g., [ARPABET-based CMU Dictionary](https://github.com/NVIDIA/NeMo/blob/r1.14.0/scripts/tts_dataset_files/cmudict-0.7b_nv22.10) or [IPA-based CMU Dictionary](https://github.com/NVIDIA/NeMo/blob/r1.14.0/scripts/tts_dataset_files/ipa_cmudict-0.7b_nv23.01.txt)\n",
"* `your_spec_generator_model.vocab.g2p.phoneme_dict` - phoneme dictionary that maps words to their phonetic transcriptions, e.g., [ARPABET-based CMU Dictionary](https://raw.githubusercontent.com/NVIDIA/NeMo/stable/scripts/tts_dataset_files/cmudict-0.7b_nv22.10) or [IPA-based CMU Dictionary](https://github.com/NVIDIA/NeMo/blob/stable/scripts/tts_dataset_files/ipa_cmudict-0.7b_nv23.01.txt)\n",
"* `your_spec_generator_model.vocab.g2p.heteronyms` - list of the model's heteronyms, grapheme form of these words will be used even if the word is present in the phoneme dictionary.\n",
"* `your_spec_generator_model.vocab.g2p.ignore_ambiguous_words`: if is set to **True**, words with more than one phonetic representation in the pronunciation dictionary are ignored. This flag is relevant to the words with multiple valid phonetic transcriptions in the dictionary that are not in `your_spec_generator_model.vocab.g2p.heteronyms` list.\n",
"* `your_spec_generator_model.vocab.phoneme_probability` - phoneme probability flag in the Tokenizer and the same from in the G2P module: `your_spec_generator_model.vocab.g2p.phoneme_probability` ([0, 1]). If a word is present in the phoneme dictionary, we still want our TTS model to see graphemes and phonemes during training to handle OOV words during inference. The `phoneme_probability` determines the probability of an unambiguous dictionary word appearing in phonetic form during model training, `(1 - phoneme_probability)` is the probability of the graphemes. This flag is set to `1` in the parse() method during inference.\n",
Expand Down
Loading