Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vocoder matching #427

Merged
merged 22 commits into from
May 24, 2024
Merged

Vocoder matching #427

merged 22 commits into from
May 24, 2024

Conversation

roedoejet
Copy link
Member

@roedoejet roedoejet commented May 14, 2024

PR Goal?

This PR aims to make it easier to match an EV vocoder with an EV text-to-spec model.

Fixes?

#302

Feedback sought?

@marctessier - Please try it out by following the documentation
@SamuelLarkin and @MENGZHEGENG - Please review to understand the teacher forcing procedure here and how the datamodule has changed for FastSpeech2
@joanise - Please help sanity check, and identify any refactoring/testing improvements I can make

Thank you all! Everyone who is not Marc is also of course welcome to try to do some vocoder matching too :) I just thought I would try and be concise about what I'm looking for from everybody.

Priority?

high - since it touches a fair amount of things and makes breaking changes to the synthesize api

Tests added?

None - most of this is changing the model code, which we don't really test. But I'd like to also add more tests before merging this.

How to test?

The documentation provides steps on how to do it. Please use a pre-trained vocoder (the original hifigan checkpoint doesn't work anymore. instead please use sgile/models/hifigan/hifigan_universal_v1_everyvoice.ckpt)

Confidence?

medium - I've tested this out and it really improves the synthesis, but I feel like there is some refactoring that could be done, and improvements to the documentation

Version change?

N/A

Related PRs

EveryVoiceTTS/HiFiGAN_iSTFT_lightning#31
EveryVoiceTTS/FastSpeech2_lightning#75
EveryVoiceTTS/DeepForcedAligner#22

@roedoejet roedoejet changed the base branch from main to dev.ap/remove-hfg May 14, 2024 00:20
Copy link

codecov bot commented May 14, 2024

Codecov Report

Attention: Patch coverage is 44.57831% with 92 lines in your changes are missing coverage. Please review.

Project coverage is 73.94%. Comparing base (8898f3b) to head (28e878b).

Files Patch % Lines
everyvoice/model/e2e/model.py 7.81% 59 Missing ⚠️
everyvoice/model/e2e/dataset.py 13.04% 20 Missing ⚠️
everyvoice/config/validation_helpers.py 91.07% 3 Missing and 2 partials ⚠️
everyvoice/dataloader/__init__.py 16.66% 5 Missing ⚠️
everyvoice/base_cli/helpers.py 0.00% 2 Missing ⚠️
everyvoice/preprocessor/preprocessor.py 90.90% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #427      +/-   ##
==========================================
- Coverage   75.98%   73.94%   -2.04%     
==========================================
  Files          42       43       +1     
  Lines        2757     2618     -139     
  Branches      455      404      -51     
==========================================
- Hits         2095     1936     -159     
- Misses        577      602      +25     
+ Partials       85       80       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@roedoejet roedoejet force-pushed the dev.ap/vocoder-match branch 2 times, most recently from d872092 to 241493a Compare May 14, 2024 17:32
Copy link
Contributor

github-actions bot commented May 14, 2024

CLI load time: 0:00.27
Pull Request HEAD: 28e878b20b1864ace70803ee79c07476a6733b18
Imports that take more than 0.1 s:
import time: self [us] | cumulative | imported package

Base automatically changed from dev.ap/remove-hfg to main May 15, 2024 01:33
@roedoejet roedoejet changed the title [WIP] Vocoder matching Vocoder matching May 15, 2024
Copy link
Member

@joanise joanise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the code, I don't see any obvious problems or refactoring opportunities, barring the trivial question of not redefining complete_path(). e2e desperately needs unit testing, though, if you can think of a way to add some while you're working on this, that would be great. utils/heavy.py has little unit testing too, but that should actually be easy to add, since the functions don't need complex context to get exercised.

everyvoice/model/e2e/cli.py Outdated Show resolved Hide resolved
text = self._load_file(basename, speaker, language, "text", "text.pt")
raw_text = item["raw_text"]
if self.config.feature_prediction.model.learn_alignment:
match self.config.feature_prediction.model.target_text_representation_level:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an existing harness to test this? This entire match statement seems to have no unit testing at all.

@marctessier
Copy link
Collaborator

I am having issues with the PR. Unable to simulate the "fine-tune" guide.

I tried training a couple new models from scratch ( ENG/ LJ & MOH) using " [dev.ap/vocoder-match]"

When I try this command below:
everyvoice synthesize from-text ./logs_and_checkpoints/FeaturePredictionExperiment/base/checkpoints/last.ckpt -O spec --filelist ./preprocessed/training_filelist.psv --teacher-forcing-folder ./preprocessed
...

It fails with this error:
RuntimeError: The size of tensor a (777) must match the size of tensor b (704) at non-singleton dimension 1

( or see attached file for full log massage)
pr427.e2417889.txt

@roedoejet
Copy link
Member Author

I am having issues with the PR. Unable to simulate the "fine-tune" guide.

I tried training a couple new models from scratch ( ENG/ LJ & MOH) using " [dev.ap/vocoder-match]"

When I try this command below: everyvoice synthesize from-text ./logs_and_checkpoints/FeaturePredictionExperiment/base/checkpoints/last.ckpt -O spec --filelist ./preprocessed/training_filelist.psv --teacher-forcing-folder ./preprocessed ...

It fails with this error: RuntimeError: The size of tensor a (777) must match the size of tensor b (704) at non-singleton dimension 1

( or see attached file for full log massage) pr427.e2417889.txt

I am having this with some models too. Thanks for checking Marc. I will investigate

Copy link
Collaborator

@SamuelLarkin SamuelLarkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another big one.

docs/guides/finetune.md Outdated Show resolved Hide resolved
docs/guides/finetune.md Outdated Show resolved Hide resolved
docs/guides/finetune.md Outdated Show resolved Hide resolved
docs/guides/finetune.md Outdated Show resolved Hide resolved
docs/guides/finetune.md Outdated Show resolved Hide resolved
if isinstance(structure, dict):
result = []
for key, value in structure.items():
result.extend(find_non_basic_substructures(key))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can keep be non basic structures? May I should phrase it as, do we use keys that are non basic structures?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so...

logger.error(
f"Sorry you do not have enough {target_text_representation_level} data in your current training/validation filelists to train/validate with a batch size of {batch_size}."
f"Sorry you do not have enough {target_text_representation_level} data in your current {name} filelist to run the model with a batch size of {batch_size}."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The batch size should be a maximum size and not a minimum size. If you have fewer examples, you should still be able to run in batch size where the batch isn't "full".

everyvoice/model/e2e/model.py Show resolved Hide resolved
everyvoice/model/e2e/model.py Show resolved Hide resolved
everyvoice/model/e2e/model.py Outdated Show resolved Hide resolved
@roedoejet roedoejet merged commit c4c7e52 into main May 24, 2024
2 of 4 checks passed
@roedoejet roedoejet deleted the dev.ap/vocoder-match branch May 24, 2024 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants