Question about Codec #12

Paulmzr · 2024-09-05T12:50:22Z

Hi, thanks for your great efforts. I notice that you write "Meta's Encodec 24K version was also tested, but it could not be trained.". Does that mean that using meta's encodec leads to poor performance?

CODEJIN · 2024-09-05T21:53:32Z

Dear @Paulmzr ,

Hello,

The training itself has not been successful. Afterward, I conducted a few tests independently, and I have personally drawn the following conclusions.

I think that combining the NaturalSpeech2 code from this repository with Encodec in my current environment does not allow proper training.
The possible causes of this issue could be the following:

The written code is incomplete.
When using a codec trained on a much wider range of external audio, the complexity of the codec latent becomes too challenging for diffusion to handle.
As the number of RVQ stacks increases, the final latent complexity increases, making it difficult for diffusion to handle.
To learn the relationship between text and codec latent, convergence cannot be achieved without using a very large batch size.

Regarding the first and second issues mentioned above, considering that a certain level of training is possible when using Hifi-codec, I believe they are unlikely to be the main reasons, even if they contribute to the issue.
The increase in complexity due to many RVQ stacks could be a potential cause of the problem. In fact, the Hifi-codec, which does train, only uses 4 VQs and even splits the dimension in half with 2 stacks for each, following a simple structure.
The need for a high batch size may be linked to the complexity and could be a potential cause of the problem. However, it is difficult to verify this with the time and GPU resources I have. Given the time constraints, it is not easy to fully validate this, even with the application of accumulation techniques.

If you have any feedback on this matter, I would greatly appreciate it.

Thank you.

Paulmzr · 2024-09-06T08:55:48Z

@CODEJIN Thank you for your detailed response! I will try to train it and share my findings!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about Codec #12

Question about Codec #12

Paulmzr commented Sep 5, 2024

CODEJIN commented Sep 5, 2024

Paulmzr commented Sep 6, 2024 •

edited

Loading

Question about Codec #12

Question about Codec #12

Comments

Paulmzr commented Sep 5, 2024

CODEJIN commented Sep 5, 2024

Paulmzr commented Sep 6, 2024 • edited Loading

Paulmzr commented Sep 6, 2024 •

edited

Loading