-
Notifications
You must be signed in to change notification settings - Fork 155
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refacotor optimizer step logic #163
Conversation
Yeah I even had a comment about it in the code :) |
Maybe it's probably better to merge this PR as is? We can then remove mixed-precision in a PR that involves docs change and could impact downstream projects such as isaacgymenvs. |
Okay! Ill review all other prs after my work in the evening. |
Also pinging @markelsanz14, are you ok with me filing a PR removing mixed-precision support? I know you were thinking about using it in the future. |
@Denys88 do we know how much removing mixed precision will affect different model sizes? I agree in most cases it won't help, mainly when we have 2-5 layer NNs, but it might speed up training with larger NN sizes. |
Hey @Denys88, @ViktorM @markelsanz14 and I had a quick chat about this, and maybe it's worth it to keep the mixed-precision option? I did a quick benchmark and found virtually no performance differences However, as @markelsanz14 suggested maybe in large NNs mixed-precision could offer benefits. |
Yeah, I wouldn't put more work into it for now, but removing it altogether seems a bit drastic. I feel like it can be useful in the future with larger NN architectures. |
could you check it on some heave networks, including transformers. |
The current implementation is problematic when not doing gradient truncation. The following filediff shows the lines that are needed to add.
But this is pretty ugly... so I refactored it to