-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix inefficient phase unwrapping #7
Fix inefficient phase unwrapping #7
Conversation
Thanks for sharing the code of this study. Reading the paper on arxiv I was astonished to see a highly redundant phase unwrapping code. I explain the problem in the code extract below ``` p = predicted phase # calculate cos/sin with phase wrapping x = torch.cos(p) y = torch.sin(p) # recalculate wrapped phase phase = torch.atan2(y, x) # calculate complex values (here with unit magnitude) from the 1j * phase S = torch.exp(phase * 1j) ``` I'd suggest changing into a 100% equivalent code that is about 3 times faster: ``` p = predicted phase # calculate cos/sin with phase wrapping x = torch.cos(p) y = torch.sin(p) S = x + 1j * y ``` You may not know that exp(jp) internally calculates cos(p) +1j * sin(p). On the other hand recalculating the phase after the first pass though cos/sin does produce a warped phase but does not avoid any of the numerical problems that may arise when p is >> 1. The problem happened already within cos/sin. Besides additional rounding errors the final S will always be what you get for the proposed change, avoiding the call to atan as well as the call to exp(1j * p). At the end this will be three times faster. See here https://github.com/roebel/vocos/blob/main/Improving_Vocos_vocoder_phase_unwrapping_issue.ipynb for a benchmark.
Thanks for pointing this out. You're absolutely right, the current implementation is awkward and indeed redundant. Now it's clear to me that we can certainly simplify it using your suggestion. Interestingly, when I trained the model with your proposed change up to 200k iterations, I noted a slight decrease in the UTMOS score. Although the forward pass seems perfectly equivalent, I suspect that the different backward computations might be influencing the training dynamics. However, I believe that with more runs, it should converge similarly. Regarding the arXiv paper, yes, it's likely to be updated. Please note that it's not yet peer-reviewed and should be considered a work in progress. |
I don't know what you consider a slight decrease. To be able to see whether this is a systematic or a random effect you would need to train multiple times and calculate the mean and std deviation of the UTMOS score, and then see whether the slight decrease is outside the std-deviation which is naturally present due to the fact that all initializations are random. It is tempting to only look at a single result but often is leading to wrong conclusions. Here given both forward paths are different only due to rounding errors, if your original approach would turn out to be systematically better I would be very surprised. I just wondered whether the difference makes a significant difference for the overall computational cost at the end. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be happy to accept this PR, but I guess that we no longer need this notebook in the main codebase after merging, so I suggest removing it
Notebook no longer needed.
Thanks for sharing the code of this study. Reading the paper on arxiv I was astonished to see a highly redundant phase unwrapping code. I explain the problem in the code extract below
I'd suggest changing to a 100% equivalent code that is about 3 times faster:
You may not know that exp(jp) internally calculates cos(p) +1j * sin(p). On the other hand, recalculating
the phase after the first pass through cos/sin does produce a warped phase but does not avoid any
of the numerical problems that may arise when p is >> 1. The problem happened already within
cos/sin. Besides additional rounding errors, the final S will always be what you get for the proposed change, avoiding the call to atan as well as the call to exp(1j * p). In the end, this will be three times faster.
See here
https://github.com/roebel/vocos/blob/main/Improving_Vocos_vocoder_phase_unwrapping_issue.ipynb
for a benchmark.
Personally, if I would be you, I would fix this also in the arxiv paper, it really looks strange that you recalculate again what you have already once you have performed the cos/sin operations.