-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vocoder matching #427
Vocoder matching #427
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #427 +/- ##
==========================================
- Coverage 75.98% 73.94% -2.04%
==========================================
Files 42 43 +1
Lines 2757 2618 -139
Branches 455 404 -51
==========================================
- Hits 2095 1936 -159
- Misses 577 602 +25
+ Partials 85 80 -5 ☔ View full report in Codecov by Sentry. |
d872092
to
241493a
Compare
|
2ba750a
to
466f21d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reading the code, I don't see any obvious problems or refactoring opportunities, barring the trivial question of not redefining complete_path()
. e2e desperately needs unit testing, though, if you can think of a way to add some while you're working on this, that would be great. utils/heavy.py
has little unit testing too, but that should actually be easy to add, since the functions don't need complex context to get exercised.
text = self._load_file(basename, speaker, language, "text", "text.pt") | ||
raw_text = item["raw_text"] | ||
if self.config.feature_prediction.model.learn_alignment: | ||
match self.config.feature_prediction.model.target_text_representation_level: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have an existing harness to test this? This entire match statement seems to have no unit testing at all.
also turn process_text into a static method
sometimes we might want to run something when initializing the model or after loading a checkpoint i.e. freezing parameters when fine-tuning the vocoder
204d1fb
to
e8a6da2
Compare
I am having issues with the PR. Unable to simulate the "fine-tune" guide. I tried training a couple new models from scratch ( ENG/ LJ & MOH) using " [dev.ap/vocoder-match]" When I try this command below: It fails with this error: ( or see attached file for full log massage) |
1e357f3
to
e8a6da2
Compare
I am having this with some models too. Thanks for checking Marc. I will investigate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another big one.
if isinstance(structure, dict): | ||
result = [] | ||
for key, value in structure.items(): | ||
result.extend(find_non_basic_substructures(key)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can keep be non basic structures? May I should phrase it as, do we use keys that are non basic structures?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so...
logger.error( | ||
f"Sorry you do not have enough {target_text_representation_level} data in your current training/validation filelists to train/validate with a batch size of {batch_size}." | ||
f"Sorry you do not have enough {target_text_representation_level} data in your current {name} filelist to run the model with a batch size of {batch_size}." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The batch size should be a maximum size and not a minimum size. If you have fewer examples, you should still be able to run in batch size where the batch isn't "full".
remove unused function, reduce losses during generator warmup
c14e214
to
28e878b
Compare
PR Goal?
This PR aims to make it easier to match an EV vocoder with an EV text-to-spec model.
Fixes?
#302
Feedback sought?
@marctessier - Please try it out by following the documentation
@SamuelLarkin and @MENGZHEGENG - Please review to understand the teacher forcing procedure here and how the datamodule has changed for FastSpeech2
@joanise - Please help sanity check, and identify any refactoring/testing improvements I can make
Thank you all! Everyone who is not Marc is also of course welcome to try to do some vocoder matching too :) I just thought I would try and be concise about what I'm looking for from everybody.
Priority?
high - since it touches a fair amount of things and makes breaking changes to the synthesize api
Tests added?
None - most of this is changing the model code, which we don't really test. But I'd like to also add more tests before merging this.
How to test?
The documentation provides steps on how to do it. Please use a pre-trained vocoder (the original hifigan checkpoint doesn't work anymore. instead please use
sgile/models/hifigan/hifigan_universal_v1_everyvoice.ckpt
)Confidence?
medium - I've tested this out and it really improves the synthesis, but I feel like there is some refactoring that could be done, and improvements to the documentation
Version change?
N/A
Related PRs
EveryVoiceTTS/HiFiGAN_iSTFT_lightning#31
EveryVoiceTTS/FastSpeech2_lightning#75
EveryVoiceTTS/DeepForcedAligner#22