-
Notifications
You must be signed in to change notification settings - Fork 6.8k
sym.Sqrt gradient inf ? #2261
Comments
Hmmm it's even more strange. The gradient comes from a reshape symbol. Shape is (1,1) See log below:
Am I not understanding something here ? Doing the reshape before the sqrt gave same result. |
@juliandewit Could you provide a code example? In that case I can replicate the error and debug locally. |
The project as-is is too complex but I'll try to come up with something more isolated. |
Great! |
I have an example that gives another problem but that might explain my problems.. Minibatchsize = 1 so all shapes are (1,1) Am I doing something wrong ? Am I using the software incorrectly ? Thanks in advance.. |
I've checked the code. 0.0716871 here means the loss function, which is the output of a sqrt Op. The gradient is thus 0.5/0.0716871 = 6.974. Also, I've found a bug in makeloss op in https://github.com/dmlc/mxnet/blob/master/src/operator/make_loss-inl.h#L55 that causes Makeloss output to be zero. Since directly using A=B will not copy the value (dmlc/mshadow#50), we need to use |
The gradient of sqrt at 0 is inf so it's not a stable loss function. |
Thanks. |
Hello, I am experimenting a bit with composing layers..
Now my network is a bit unstable. Using the monitor I get "inf" for the gradient of a sqrt symbol.
Now I look at the gradient pass of the sqrt symbol and I see:
struct square_root_grad { template<typename DType> MSHADOW_XINLINE static DType Map(DType a) { return DType(DType(0.5f) / a); } };
Now I'm not 100% sure of how everything works but doesn't this pose a problem if incoming gradient (a) == 0
?
It would explain my issues..
The text was updated successfully, but these errors were encountered: