-
Notifications
You must be signed in to change notification settings - Fork 31.8k
Fix DAC conversion script #39793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix DAC conversion script #39793
Conversation
| 1. Transformer model does not use weight norm for speed-up. And during model conversion, weight norm was removed on | ||
| CPU (old script: https://github.com/huggingface/transformers/blob/8e077a3e452e8cab94ef62b37d68258bd3dcffed/src/transformers/models/dac/convert_dac_checkpoint.py#L230) | ||
| This leads to slightly different weight (1e-8) and the error accumulates. Removing weight norm on GPU would produce | ||
| equivalent weights (current conversion script). | ||
| 2. Original version uses Snake1D activation with JIT: https://github.com/descriptinc/descript-audio-codec/blob/c7cfc5d2647e26471dc394f95846a0830e7bec34/dac/nn/layers.py#L18 | ||
| Transformer version does not use JIT, so outputs are slightly different. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated (definite) reason for high tolerances
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
vasqu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not update the conversion if we don't change the hub. Having a legacy path is unideal and makes it confusing for the average user as hub differs from the script here.
There are two option imo:
- Change to new conversion (no extra flags) and update hub weights
- Only leave the description where the differences stem from
I'd prefer option 1 even if it was breaking tbh. Would wait on Eustache here tbh
|
[For maintainers] Suggested jobs to run (before merge) run-slow: dac |
|
thanks @vasqu! 🚨 @eustlb (when you're back), @vasqu and I spoke offline that it would be better to:
Main reason being that several models are depending on DAC (XCodec, Dia, Higgs Boson, maybe more), and it would be better that they don't depend on a model with minor output differences. As model addition/integration will be trickier since we may not be able to isolate if differences are coming from DAC or from implementing the new model. |
What does this PR do
Reproducer to show weight norm difference when doing weight removal on a different device: https://gist.github.com/ebezzam/c83f186dcfeaab8cac040c960eb474cd