Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TTS] Fix aligner nan loss in fp32 #6435

Merged
merged 2 commits into from
May 8, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 1 addition & 3 deletions nemo/collections/tts/losses/aligner_loss.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,9 +58,7 @@ def forward(self, attn_logprob, in_lens, out_lens):
# Convert to log probabilities
# Note: Mask out probs beyond key_len
key_inds = torch.arange(max_key_len + 1, device=attn_logprob.device, dtype=torch.long)
attn_logprob.masked_fill_(
key_inds.view(1, 1, -1) > key_lens.view(1, -1, 1), -float("inf") # key_inds >= key_lens+1
)
attn_logprob.masked_fill_(key_inds.view(1, 1, -1) > key_lens.view(1, -1, 1), -1e15) # key_inds >= key_lens+1
attn_logprob = self.log_softmax(attn_logprob)
Comment on lines +61 to 62
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make the masking logit value a parameter to the class, similar to "bank_logit"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I am curious if anyone knows why this would produce NaN. Does it happen immediately or a long time into training?

Sounds like a pretty large bug, if it happens for any fp32 training run.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In FastPitch and Mixer-TTS Training tutorial, this would appear immediately. In my own FastPitch pre-training, this appear after several epochs. I am not sure how this happen.


# Target sequences
Expand Down
Loading