Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TTS] Add tutorial for TTS data prep scripts #6922

Merged
merged 4 commits into from
Jul 12, 2023
Merged

[TTS] Add tutorial for TTS data prep scripts #6922

merged 4 commits into from
Jul 12, 2023

Conversation

rlangman
Copy link
Collaborator

What does this PR do ?

Add a tutorial demonstrating how to do the end to end data preparation and training with the new TTS preprocessing scripts and data loader.

Collection: [TTS]

Changelog

  • Create tutorial

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

XuesongYang
XuesongYang previously approved these changes Jul 8, 2023
Copy link
Collaborator

@XuesongYang XuesongYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review. LGTM, added some neat-picks.

"source": [
"In this tutorial, we will prepare a dataset using our [TTS Dataset Processing Scripts](https://github.com/NVIDIA/NeMo/tree/main/scripts/dataset_processing/tts) and use it for training a FastPitch model.\n",
"\n",
"**This tutorial uses a different workflow than all other existing TTS tutorials. The scripts and classes used are all experimental and not yet ready for production**"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing a period at the end of the sentence.

{
"cell_type": "markdown",
"source": [
"# Dataset Prepration"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/Prepration/Preparation/

"cell_type": "code",
"source": [
"import IPython.display as ipd\n",
"from matplotlib.pyplot import imshow"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove since no other places use it.

"source": [
"We can use [create_speaker_map.py](https://github.com/NVIDIA/NeMo/blob/main/scripts/dataset_processing/tts/create_speaker_map.py) to easily create a mapping from speaker ID strings to integer indices that will be used at training time.\n",
"\n",
"The script will simply sort the speaker IDs and assign them numbers [0, num_speakers) in alphabetical order."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/[0, num_speakers)/[0, num_speakers)/

{
"cell_type": "markdown",
"source": [
"Before training FastPitch, we need to compute some features for every audio file. The default [config file](https://github.com/NVIDIA/NeMo/blob/main/examples/tts/conf/feature/feature_44100.yaml) we will use has parameters for computing the **pitch** and **energy** of every audio frame."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I saw inconsistent font formats for pitch and energy, and sometimes pitch and energy.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the tutorial to try to the formatting more consistent.

  • Use bold when it is the first time an important vocab term is mentioned.
  • Use code when it refers to specific code, variable name, file, etc.
  • Use italics to emphasize any other key words.

"For training it is beneficial for us to *normalize* our features. The most standard approach is to apply **mean-variance normalization** so that each feature has a mean of 0 and variance of 1. To do this we need to compute the *dataset statistics* with the mean and variance of each feature.\n",
"\n",
"For TTS it also helps\n",
"* Normalize features using speaker-level statistics\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing a period.

@XuesongYang XuesongYang merged commit 728403d into main Jul 12, 2023
14 of 15 checks passed
@XuesongYang XuesongYang deleted the tts_tutorial branch July 12, 2023 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants