sym.Sqrt gradient inf ? #2261

juliandewit · 2016-05-27T07:59:13Z

Hello, I am experimenting a bit with composing layers..
Now my network is a bit unstable. Using the monitor I get "inf" for the gradient of a sqrt symbol.

Now I look at the gradient pass of the sqrt symbol and I see:
struct square_root_grad { template<typename DType> MSHADOW_XINLINE static DType Map(DType a) { return DType(DType(0.5f) / a); } };

Now I'm not 100% sure of how everything works but doesn't this pose a problem if incoming gradient (a) == 0
?

It would explain my issues..

The text was updated successfully, but these errors were encountered:

juliandewit · 2016-05-27T08:25:50Z

Hmmm it's even more strange.
I set minibatch size to 1

The gradient comes from a reshape symbol. Shape is (1,1)
So there is only one value which is 0.49....
As I understand this flows into the sqrt and then the gradient (also shape (1,1) ) is 99.931.

See log below:

INFO:root:Batch:       1 reshape3_backward_data         0.499396    
INFO:root:Batch:       1 sqrt1_backward_data            99.931

Am I not understanding something here ?
I would expect 0.5 / 0.499 ~ 1.

Doing the reshape before the sqrt gave same result.

sxjscience · 2016-05-27T10:54:16Z

@juliandewit Could you provide a code example? In that case I can replicate the error and debug locally.

juliandewit · 2016-05-27T11:22:00Z

The project as-is is too complex but I'll try to come up with something more isolated.

sxjscience · 2016-05-27T11:24:46Z

Great!

juliandewit · 2016-05-27T12:10:38Z

I have an example that gives another problem but that might explain my problems..
It most probably does not work but I look at the 1st monitor output and it confuses me.

Minibatchsize = 1 so all shapes are (1,1)
Y = 0.071667
output = almost zero
square and sqrt output look correct.
Makeloss output is '0' which seems strange to me but could be...
Makeloss gradient is '1' which also seems strange but could be.
However now sqrt gradient is '6.974...' which I just do not understand.

Am I doing something wrong ? Am I using the software incorrectly ?

custom_loss.txt

monitor1.txt

Thanks in advance..

sxjscience · 2016-05-27T13:04:47Z

I've checked the code. 0.0716871 here means the loss function, which is the output of a sqrt Op. The gradient is thus 0.5/0.0716871 = 6.974.

Also, I've found a bug in makeloss op in https://github.com/dmlc/mxnet/blob/master/src/operator/make_loss-inl.h#L55 that causes Makeloss output to be zero. Since directly using A=B will not copy the value (dmlc/mshadow#50), we need to use 1.0f * data . I'll make a PR for this.

piiswrong · 2016-05-27T18:45:41Z

The gradient of sqrt at 0 is inf so it's not a stable loss function.
Use sqrt(1+relu(x)) to get a stable loss

juliandewit · 2016-05-28T06:00:43Z

Thanks.
Ok conclusion.
Gradient of SQRT is working as expected.
Only when incoming gradient == 0 it's not stable..
So I will have to stabilize.
I will download makeloss fix to be sure.
Closed.

sxjscience mentioned this issue May 27, 2016

Fix makeloss #2263

Merged

juliandewit closed this as completed May 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sym.Sqrt gradient inf ? #2261

sym.Sqrt gradient inf ? #2261

juliandewit commented May 27, 2016 •

edited

Loading

juliandewit commented May 27, 2016 •

edited

Loading

sxjscience commented May 27, 2016

juliandewit commented May 27, 2016

sxjscience commented May 27, 2016

juliandewit commented May 27, 2016 •

edited

Loading

sxjscience commented May 27, 2016

piiswrong commented May 27, 2016

juliandewit commented May 28, 2016 •

edited

Loading

sym.Sqrt gradient inf ? #2261

sym.Sqrt gradient inf ? #2261

Comments

juliandewit commented May 27, 2016 • edited Loading

juliandewit commented May 27, 2016 • edited Loading

sxjscience commented May 27, 2016

juliandewit commented May 27, 2016

sxjscience commented May 27, 2016

juliandewit commented May 27, 2016 • edited Loading

sxjscience commented May 27, 2016

piiswrong commented May 27, 2016

juliandewit commented May 28, 2016 • edited Loading

juliandewit commented May 27, 2016 •

edited

Loading

juliandewit commented May 27, 2016 •

edited

Loading

juliandewit commented May 27, 2016 •

edited

Loading

juliandewit commented May 28, 2016 •

edited

Loading