Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About nan and inf grads #43

Closed
lnexenl opened this issue Jun 9, 2024 · 6 comments
Closed

About nan and inf grads #43

lnexenl opened this issue Jun 9, 2024 · 6 comments

Comments

@lnexenl
Copy link
Contributor

lnexenl commented Jun 9, 2024

I found that when preforming backward, there sometimes exists warnings like:
image

do these nan or inf grads have bad effects on training?

@Parskatt
Copy link
Owner

Parskatt commented Jun 9, 2024

Hi, we are using fp16 training with a gradscaler. The gradscaler should tale care of nan/infs.

So to answer your question, the nans should not affect the training. But let me know if you have issues.

@lnexenl
Copy link
Contributor Author

lnexenl commented Jun 9, 2024

Thank you so much for your reply. I guess it won't be a problem if nan/inf grads only exist once for every dozens of steps?

@Parskatt
Copy link
Owner

Parskatt commented Jun 9, 2024

Exactly, but make sure youre using the gradscaler. Actually I had a lot of issues getting fp16 training to be stable, so let me know if you get any other issues.

@lnexenl
Copy link
Contributor Author

lnexenl commented Jun 10, 2024

I meet some backward issues when training:
image

Have you ever met such problem? I add epipolar error by estimating E matrix when training.

@Parskatt
Copy link
Owner

Is this the backward of the E solver?

I haven't tried myself, but perhaps if you save the inputs to the solver you may be able to find the issue.

Maybe you picked the same correspondence twice?

@lnexenl
Copy link
Contributor Author

lnexenl commented Jun 11, 2024

I find out the problem is caused by backward of torch.linalg.solve in E solver. I added an regularization item to A and solved the problem. Thank u a lot for your kindly reply.

@lnexenl lnexenl closed this as completed Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants