About nan and inf grads #43

lnexenl · 2024-06-09T12:23:15Z

I found that when preforming backward, there sometimes exists warnings like:

do these nan or inf grads have bad effects on training?

Parskatt · 2024-06-09T14:53:43Z

Hi, we are using fp16 training with a gradscaler. The gradscaler should tale care of nan/infs.

So to answer your question, the nans should not affect the training. But let me know if you have issues.

lnexenl · 2024-06-09T15:53:51Z

Thank you so much for your reply. I guess it won't be a problem if nan/inf grads only exist once for every dozens of steps?

Parskatt · 2024-06-09T15:56:18Z

Exactly, but make sure youre using the gradscaler. Actually I had a lot of issues getting fp16 training to be stable, so let me know if you get any other issues.

lnexenl · 2024-06-10T12:01:24Z

I meet some backward issues when training:

Have you ever met such problem? I add epipolar error by estimating E matrix when training.

Parskatt · 2024-06-10T14:12:10Z

Is this the backward of the E solver?

I haven't tried myself, but perhaps if you save the inputs to the solver you may be able to find the issue.

Maybe you picked the same correspondence twice?

lnexenl · 2024-06-11T05:11:02Z

I find out the problem is caused by backward of torch.linalg.solve in E solver. I added an regularization item to A and solved the problem. Thank u a lot for your kindly reply.

lnexenl closed this as completed Jun 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About nan and inf grads #43

About nan and inf grads #43

lnexenl commented Jun 9, 2024

Parskatt commented Jun 9, 2024

lnexenl commented Jun 9, 2024

Parskatt commented Jun 9, 2024

lnexenl commented Jun 10, 2024

Parskatt commented Jun 10, 2024

lnexenl commented Jun 11, 2024

About nan and inf grads #43

About nan and inf grads #43

Comments

lnexenl commented Jun 9, 2024

Parskatt commented Jun 9, 2024

lnexenl commented Jun 9, 2024

Parskatt commented Jun 9, 2024

lnexenl commented Jun 10, 2024

Parskatt commented Jun 10, 2024

lnexenl commented Jun 11, 2024