-
Notifications
You must be signed in to change notification settings - Fork 223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modify SpeechSynthesisDataset class, make it return text #1205
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, but don't you prefer to have G2P inside SpeechSynthesisDataset
to avoid spending the time running it inside the training loop?
lhotse/dataset/speech_synthesis.py
Outdated
} | ||
""" | ||
|
||
def __init__( | ||
self, | ||
cuts: CutSet, | ||
return_cuts: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is breaking, can you move return_cuts
to some position towards the end of parameter list?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
Hi, @pzelasko, I prefer to do the text normalization and tokenization in separate training recipes, since they usually depend on different packages. @csukuangfj suggests doing this in data preparation stage, with converted phonemes saved to manifests. |
OK cool. It looks like there are some conflicts after merging the other PR with multi-speaker support, could you resolve them? |
Ok. Thanks. I have some local changes and will resolve the conflicts later. In addition, in current implementation, it will load the whole cuts set and generate a char-based vocabulary according to the given texts. But I think TTS recipes usually use phoneme tokens instead. Also, if might cause token-id-mapping mismatch bettwen training and test, when separately given two cuts sets for this class. Shall we remove this and make this calss return raw text and optionally pre-converted (phoneme) tokens? lhotse/lhotse/dataset/speech_synthesis.py Line 43 in 7061175
|
I see that this class accepts |
@pzelasko Thanks. It is ready to be merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM!
This PR is required by the TTS recipe k2-fsa/icefall#1372 in icefall. In this TTS recipe, we convert the transcript text to phonemes in training.