-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some questions about the reproduction of this paper: from newcomer #25
Comments
Hello! @wangxuanji, Thank you for your interest in 🍵 Matcha-TTS. These are amazing hands on set of questions which I really enjoy answering.
I used a split similar to Tacotron 2, but often diffusion-type losses do not correlate very well with model performance, I did not use the test set to evaluate any metrics during the training.
Perhaps, this is a heuristic of diffusion-type models that the longer you train them generally they tend to improve and we saw the same during our preliminary experiments. Speaking of RTF, it doesn't change even if the model is not trained it is an architectural nuance rather than a training nuance. And for WER, I computed this by loading the Whisper model separately offline, I do not suggest doing it online during training as the Whisper model is a bulky model in itself and it significantly reduces the training speed when I tried transcribing parallelly.
I used pytorch-hydra-template which I found really amazing during my RnD iterations. Since my aim was not to not evaluate the test set (similar to diffusion, the CFM loss is very noisy). I removed the eval file, which you can put back in case you need to evaluate. And then in baselightningmodule.py you would need to define a
Please, these are amazing questions, thank you very much for coming out and asking them. It would improve the experience for someone else too. Hope this answers your questions feel free to continue the discussion in case you have any further questions. Regards, |
Hello, when I saw your reply, I felt very happy. Once again, I would like to express my gratitude. I have some unimportant questions, could you please answer them by the way? I hope it hasn't caused you any trouble. |
This is correct, what I meant by model changes is the change in model architecture or some hyperparameters like network sizes, number of layers, number of ODE solver steps etc.
No not at all. Actually, all the numbers in the paper are without ONNX export. In the section CLI Arguments there are commands to do that for you. Specifically, if you pass a file using matcha-tts --file <PATH TO FILE> You will have it synthesised one by one and get individual + mean RTF values. However there is also a faster way to do it just pass matcha-tts --file <PATH TO FILE> --batched And it will do batched synthesis which can be significantly faster if you have multiple utterances. Hope this helps. |
Thank you for your reply. It has been very helpful to me. I wish you success in your work and a happy life |
Thank you for your kind words! I wish you the same 😄 |
Hi @shivammehta25, Can you provide the function of |
We didn't use the |
Hello, I am a novice in the field of speech, and I don't understand many things. I hope you don't take the trouble to answer my questions. Thank you again.
Firstly, for the dataset, I used the LJspeed dataset used in the paper, with a total of 13100 wavs. I divided it into training, validation, and testing sets in a 7:2:1 ratio. Is there any problem with my approach and how did you divide the experiments at that time?
Secondly, I have some doubts about the code. Is the best model saved according to epoch (monitor: epoch # name of the logged metric which determines when the model is improving)? That's because the more iterations the model has, the better. I didn't find any other evaluation indicators in the code, such as RTF or WER after running, which may be because my code ability is too poor. I don't quite understand this point.
Thirdly, there is a section of code in train.py:
If logger:
Log. info ("Logging hyperparameters!")
Utils.log_ Hyperparameters (object_dict)
If cfg. get ("train"):
Log. info ("Starting training!")
Trainer. fit (model=model, datamodule=datamodule, ckpt_path=cfg. get ("ckpt_path"))
Train_ Metrics=trainer. callback_ Metrics
If cfg. get ("test"):
Log. info ("Starting testing!")
Ckpt_ Path=trainer. checkpoint_ Callback. best_ Model_ Path
If ckpt_ Path=="":
Log. warning ("Best ckpt not found! Using current weights for testing...")
Ckpt_ Path=None
Trainer. test (model=model, datamodule=datamodule, ckpt_path=ckpt_path)
Log. info (f "Best ckpt path: {ckpt_path}")
Test_ Metrics=trainer. callback_ Metrics
#Merge train and test metrics
Metric_ Dict={* * train_metrics, * * test_metrics}
Return metric_ Dict, object_ Dict
For this code, I only saw (Starting training!) but not (Starting testing!) during runtime, and the testing section was not found. When I tried running a small epoch instead of 50k iterations in the paper, I stopped it and encountered
( raise MisconfigurationException(f"No
{step_name}()
method defined to runTrainer.{trainer_method}
.")lightning.fabric.utilities.exceptions.MisconfigurationException: No
test_step()
method defined to runTrainer.test
.)What is the reason for this?
These questions may seem childish to you, but they are really important to me. Could you please answer them and express my gratitude to you again.
The text was updated successfully, but these errors were encountered: