Vocoder matching #427

roedoejet · 2024-05-14T00:18:36Z

PR Goal?

This PR aims to make it easier to match an EV vocoder with an EV text-to-spec model.

Fixes?

Feedback sought?

@marctessier - Please try it out by following the documentation
@SamuelLarkin and @MENGZHEGENG - Please review to understand the teacher forcing procedure here and how the datamodule has changed for FastSpeech2
@joanise - Please help sanity check, and identify any refactoring/testing improvements I can make

Thank you all! Everyone who is not Marc is also of course welcome to try to do some vocoder matching too :) I just thought I would try and be concise about what I'm looking for from everybody.

Priority?

high - since it touches a fair amount of things and makes breaking changes to the synthesize api

Tests added?

None - most of this is changing the model code, which we don't really test. But I'd like to also add more tests before merging this.

How to test?

The documentation provides steps on how to do it. Please use a pre-trained vocoder (the original hifigan checkpoint doesn't work anymore. instead please use sgile/models/hifigan/hifigan_universal_v1_everyvoice.ckpt)

Confidence?

medium - I've tested this out and it really improves the synthesis, but I feel like there is some refactoring that could be done, and improvements to the documentation

Version change?

N/A

Related PRs

EveryVoiceTTS/HiFiGAN_iSTFT_lightning#31
EveryVoiceTTS/FastSpeech2_lightning#75
EveryVoiceTTS/DeepForcedAligner#22

codecov · 2024-05-14T17:13:05Z

Codecov Report

Attention: Patch coverage is 44.57831% with 92 lines in your changes are missing coverage. Please review.

Project coverage is 73.94%. Comparing base (8898f3b) to head (28e878b).

Files	Patch %	Lines
everyvoice/model/e2e/model.py	7.81%	59 Missing ⚠️
everyvoice/model/e2e/dataset.py	13.04%	20 Missing ⚠️
everyvoice/config/validation_helpers.py	91.07%	3 Missing and 2 partials ⚠️
everyvoice/dataloader/__init__.py	16.66%	5 Missing ⚠️
everyvoice/base_cli/helpers.py	0.00%	2 Missing ⚠️
everyvoice/preprocessor/preprocessor.py	90.90%	0 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #427      +/-   ##
==========================================
- Coverage   75.98%   73.94%   -2.04%     
==========================================
  Files          42       43       +1     
  Lines        2757     2618     -139     
  Branches      455      404      -51     
==========================================
- Hits         2095     1936     -159     
- Misses        577      602      +25     
+ Partials       85       80       -5

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

github-actions · 2024-05-14T17:40:32Z

CLI load time: 0:00.27
Pull Request HEAD: 28e878b20b1864ace70803ee79c07476a6733b18
Imports that take more than 0.1 s:
import time: self [us] | cumulative | imported package

joanise

Reading the code, I don't see any obvious problems or refactoring opportunities, barring the trivial question of not redefining complete_path(). e2e desperately needs unit testing, though, if you can think of a way to add some while you're working on this, that would be great. utils/heavy.py has little unit testing too, but that should actually be easy to add, since the functions don't need complex context to get exercised.

everyvoice/model/e2e/cli.py

joanise · 2024-05-16T15:22:56Z

everyvoice/model/e2e/dataset.py

-        text = self._load_file(basename, speaker, language, "text", "text.pt")
-        raw_text = item["raw_text"]
+        if self.config.feature_prediction.model.learn_alignment:
+            match self.config.feature_prediction.model.target_text_representation_level:


Do we have an existing harness to test this? This entire match statement seems to have no unit testing at all.

also turn process_text into a static method

sometimes we might want to run something when initializing the model or after loading a checkpoint i.e. freezing parameters when fine-tuning the vocoder

marctessier · 2024-05-21T14:49:53Z

I am having issues with the PR. Unable to simulate the "fine-tune" guide.

I tried training a couple new models from scratch ( ENG/ LJ & MOH) using " [dev.ap/vocoder-match]"

When I try this command below:
everyvoice synthesize from-text ./logs_and_checkpoints/FeaturePredictionExperiment/base/checkpoints/last.ckpt -O spec --filelist ./preprocessed/training_filelist.psv --teacher-forcing-folder ./preprocessed
...

It fails with this error:
RuntimeError: The size of tensor a (777) must match the size of tensor b (704) at non-singleton dimension 1

( or see attached file for full log massage)
pr427.e2417889.txt

roedoejet · 2024-05-22T17:04:35Z

I am having issues with the PR. Unable to simulate the "fine-tune" guide.

I tried training a couple new models from scratch ( ENG/ LJ & MOH) using " [dev.ap/vocoder-match]"

When I try this command below: everyvoice synthesize from-text ./logs_and_checkpoints/FeaturePredictionExperiment/base/checkpoints/last.ckpt -O spec --filelist ./preprocessed/training_filelist.psv --teacher-forcing-folder ./preprocessed ...

It fails with this error: RuntimeError: The size of tensor a (777) must match the size of tensor b (704) at non-singleton dimension 1

( or see attached file for full log massage) pr427.e2417889.txt

I am having this with some models too. Thanks for checking Marc. I will investigate

fixes: #430

SamuelLarkin

Another big one.

docs/guides/finetune.md

SamuelLarkin · 2024-05-24T13:17:12Z

everyvoice/tests/test_model.py

+    if isinstance(structure, dict):
+        result = []
+        for key, value in structure.items():
+            result.extend(find_non_basic_substructures(key))


Can keep be non basic structures? May I should phrase it as, do we use keys that are non basic structures?

I don't think so...

SamuelLarkin · 2024-05-24T13:25:02Z

everyvoice/utils/__init__.py

        logger.error(
-            f"Sorry you do not have enough {target_text_representation_level} data in your current training/validation filelists to train/validate with a batch size of {batch_size}."
+            f"Sorry you do not have enough {target_text_representation_level} data in your current {name} filelist to run the model with a batch size of {batch_size}."


The batch size should be a maximum size and not a minimum size. If you have fewer examples, you should still be able to run in batch size where the batch isn't "full".

everyvoice/model/e2e/model.py

remove unused function, reduce losses during generator warmup

roedoejet changed the base branch from main to dev.ap/remove-hfg May 14, 2024 00:20

roedoejet mentioned this pull request May 14, 2024

vocoder matching EveryVoiceTTS/FastSpeech2_lightning#75

Merged

roedoejet force-pushed the dev.ap/vocoder-match branch 2 times, most recently from d872092 to 241493a Compare May 14, 2024 17:32

Base automatically changed from dev.ap/remove-hfg to main May 15, 2024 01:33

roedoejet requested review from SamuelLarkin, MENGZHEGENG, marctessier and joanise May 15, 2024 22:51

roedoejet changed the title ~~[WIP] Vocoder matching~~ Vocoder matching May 15, 2024

roedoejet force-pushed the dev.ap/vocoder-match branch from 2ba750a to 466f21d Compare May 15, 2024 23:06

joanise reviewed May 16, 2024

View reviewed changes

roedoejet added 8 commits May 17, 2024 09:04

fix(e2e): allow e2e model to train again

582e7fa

chore: update submodule

0225bd4

perf: split utils into heavy and light for loading performance

86215f9

refactor: use the standard datamodule for synthesis as well

40e8e51

also turn process_text into a static method

fix(test): update test

e2cca0a

feat: provide hook for when checkpoint is loaded

7a076ac

sometimes we might want to run something when initializing the model or after loading a checkpoint i.e. freezing parameters when fine-tuning the vocoder

docs: add partial documentation for vocoder matching

44002c8

docs: explain teacher forcing

e8a6da2

roedoejet force-pushed the dev.ap/vocoder-match branch from 204d1fb to e8a6da2 Compare May 17, 2024 16:09

joanise force-pushed the dev.ap/vocoder-match branch from 1e357f3 to e8a6da2 Compare May 21, 2024 21:16

joanise and others added 3 commits May 22, 2024 09:52

test: fail early when import changes will fail in CI

8f4dc54

refactor: group all functions only used by config/utils in one file

fe43421

chore: update submodule

e31c071

ci: show the textual coverage report in the CI logs

06d9ce8

joanise and others added 6 commits May 22, 2024 17:19

test: make sure a ckpt has no objects, only basic structures

c1581da

chore: minor import change and comment

65bd439

chore: update submodules

f8297eb

fixes: #430

fix(ci): coverage report needs to run in everyvoice

75fde27

chore: update submodule

b38dab4

docs: update docs with more info

645b125

SamuelLarkin requested changes May 24, 2024

View reviewed changes

roedoejet added 3 commits May 24, 2024 10:23

refactor: rename folder to director per as per @SamuelLarkin

483068d

docs: simplify finetuning by adding flag

c55113c

refactor: add eric and sam's suggestions

966b361

remove unused function, reduce losses during generator warmup

roedoejet requested review from SamuelLarkin and joanise May 24, 2024 17:34

chore: update submodule

28e878b

roedoejet force-pushed the dev.ap/vocoder-match branch from c14e214 to 28e878b Compare May 24, 2024 17:46

roedoejet merged commit c4c7e52 into main May 24, 2024
2 of 4 checks passed

roedoejet deleted the dev.ap/vocoder-match branch May 24, 2024 17:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vocoder matching #427

Vocoder matching #427

roedoejet commented May 14, 2024 •

edited

Loading

codecov bot commented May 14, 2024 •

edited

Loading

github-actions bot commented May 14, 2024 •

edited

Loading

joanise left a comment

joanise May 16, 2024

marctessier commented May 21, 2024

roedoejet commented May 22, 2024

SamuelLarkin left a comment

SamuelLarkin May 24, 2024

roedoejet May 24, 2024

SamuelLarkin May 24, 2024

Vocoder matching #427

Vocoder matching #427

Conversation

roedoejet commented May 14, 2024 • edited Loading

PR Goal?

Fixes?

Feedback sought?

Priority?

Tests added?

How to test?

Confidence?

Version change?

Related PRs

codecov bot commented May 14, 2024 • edited Loading

Codecov Report

github-actions bot commented May 14, 2024 • edited Loading

joanise left a comment

Choose a reason for hiding this comment

joanise May 16, 2024

Choose a reason for hiding this comment

marctessier commented May 21, 2024

roedoejet commented May 22, 2024

SamuelLarkin left a comment

Choose a reason for hiding this comment

SamuelLarkin May 24, 2024

Choose a reason for hiding this comment

roedoejet May 24, 2024

Choose a reason for hiding this comment

SamuelLarkin May 24, 2024

Choose a reason for hiding this comment

roedoejet commented May 14, 2024 •

edited

Loading

codecov bot commented May 14, 2024 •

edited

Loading

github-actions bot commented May 14, 2024 •

edited

Loading