-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About nan and inf grads #43
Comments
Hi, we are using fp16 training with a gradscaler. The gradscaler should tale care of nan/infs. So to answer your question, the nans should not affect the training. But let me know if you have issues. |
Thank you so much for your reply. I guess it won't be a problem if nan/inf grads only exist once for every dozens of steps? |
Exactly, but make sure youre using the gradscaler. Actually I had a lot of issues getting fp16 training to be stable, so let me know if you get any other issues. |
Is this the backward of the E solver? I haven't tried myself, but perhaps if you save the inputs to the solver you may be able to find the issue. Maybe you picked the same correspondence twice? |
I find out the problem is caused by backward of torch.linalg.solve in E solver. I added an regularization item to A and solved the problem. Thank u a lot for your kindly reply. |
I found that when preforming backward, there sometimes exists warnings like:

do these nan or inf grads have bad effects on training?
The text was updated successfully, but these errors were encountered: