Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Got -9223372036854775808 (-2^63) when differentiating a sequential sum of squares #370

Open
zkytony opened this issue Mar 12, 2018 · 3 comments

Comments

@zkytony
Copy link

zkytony commented Mar 12, 2018

I have the following objective function. Given an N by M matrix x, it computes the sum of squared norms between two adjacent rows. (For instance, If x is a sequence of 2D locations, then this function computes the length of total displacement.)

def min_displacement_obj(x):
    """
    Computes:
      T-1
      sum ||x_{t+1} - x_{t}||^2
      t=1
    """
    return np.sum(np.array([np.linalg.norm(x[i] - x[i-1])**2
                                          for i in range(1, len(x))]))

I'd like to compute the gradient of this function with respect to x, so I used autograd.grad. But for the example below,

import autograd.numpy as np
from autograd import grad

f = min_displacement_obj
grad_f = grad(f)

x = np.array([[1],[0],[0],[1]])
grad_fval = grad_f(x)
print(grad_fval)

I get

[[                   2]
 [-9223372036854775808]
 [-9223372036854775808]
 [                   2]]

This doesn't look right (unless I made silly mistake): If I manually compute the gradient of f with respect to a component of x, say x[i], the gradient formula I get is:

df(x)/dx[i] = d(||x[i] - x[i-1]||^2 + ||x[i+1] - x[i]||^2)/dx = 4x[i] - 2x[i-1] - 2x[i+1]

Then, in the above example, when i=1, df(x)/dx[1] = 4*0 - 2*1 - 2*0 = -2. How come it becomes -9223372036854775808? I'm very confused.

However, if I just change x to

x = np.array([[1],[0],[0.1],[1]])

I get

[[ 2. ]
 [-2.2]
 [-1.6]
 [ 1.8]]

This is correct, because again when i=1, df(x)/dx[1] = 4*0 - 2*1 - 2*0.1 = -2.2

I found that autograd doesn't compute the correct result when any of the adjacent components of x have the same value. It just unanimously outputs -9223372036854775808 for the gradient w.r.t all those components. Is this a bug?

@dhirschfeld
Copy link
Contributor

Think it's integer overflow - i.e. if you ensure your input array is floating point it will likely work.

Might still be considered a bug in autograd if so as silently returning the wrong value is pretty nasty, even if the value returned is likely to be pretty obviously wrong

@neonwatty
Copy link

neonwatty commented Mar 14, 2018

I tried your code and converted your input x = np.array([[1],[0],[0],[1]]) to floats first but got the runtime error
RuntimeWarning: invalid value encountered in double_scalars return expand(g / ans) * x

and a corresponding output with nans of

[[  2.]
 [ nan]
 [ nan]
 [  2.]]

Perhaps something is going on with np.linalg? Re-writing what you gave above without using np.linalg - using np.sum instead - calling this min_displacement_obj_2

def min_displacement_obj_2(x):
    """
    Computes:
      T-1
      sum ||x_{t+1} - x_{t}||^2
      t=1
    """    
    return np.sum([np.sum(x[i] - x[i-1])**2  for i in range(1, x.shape[0])])

works fine given your test vector

f = min_displacement_obj_2
grad_f = grad(f)

x = np.array([[1.0],[0.0],[0.0],[1.0]])
x = x.astype(float)
grad_fval = grad_f(x)
print(grad_fval)

[[ 2.]
 [-2.]
 [-2.]
 [ 2.]]

This won't work necessarily on the rows of a matrix - but if I understand you correctly then you can extend it as min_displacement_obj_3like below

def min_displacement_obj_3(x):
    """
    Computes:
      T-1
      sum ||x_{t+1} - x_{t}||^2
      t=1
    """    
    return np.sum([np.sum(x[i,:] - x[i-1,:],axis = 0)**2 for i in range(1, x.shape[0])])

This still works fine for the test vector, and for a test matrix seems to work as well e.g.,

X = np.array([[1,1,1],[2,2,2],[3,3,3],[4,4,4]])
X  = X.astype(float)
grad_fval = grad_f(X)
print(grad_fval)

[[-6. -6. -6.]
 [ 0.  0.  0.]
 [ 0.  0.  0.]
 [ 6.  6.  6.]]

@bewantbe
Copy link

This is probably a problem that essentially all automatic differentiation algorithm will encounter.

A minimum example looks like this (run it twice to get rid of the warning :-)):

from autograd import grad
import autograd.numpy as np

func = lambda x: np.sqrt(x**2)**2
x = np.array([0.0])
grad_fval = grad(func)(x)
print(grad_fval)

Mathematically, the reason is clear when write down the chain rule:

$ d/dt sqrt(x^2)^2 = (2 * w2) * (1/2 / w1^1/2) * (2 * x), w1 = x^2, w2 = sqrt(x^2). $

When x = 0, it is something like 0 * (1/0) * 0, a typical indeterminate form.

In other words, the function norm() creates a discontinuity of derivative at the origin, but then you square it, making it continued again. The program is just not clever enough to eliminate this spurious discontinuity.

Luckily, as @jermwatt pointed out, there is a warning, and this warning should not be ignored. The message expand(g / ans) * x essentially g / ans * x is the derivative of the L2-norm times g.

BTW: I'm worrying about this "bug", because that the highly nonlinear objective function might -- if unfortunate enough -- hit this problem. Do we have a guarantee that the Warning about non-smooth or indeterminate result is always shown?

bewantbe added a commit to bewantbe/autograd that referenced this issue May 3, 2018
Fix issues HIPS#370

Now the gradient at zero (origin) point of np.linalg.norm() is the same as np.abs, which is zero, one of its subgradient.
For second order gradients, mathematically they should be +infinity, but
here when ord>=2 it returns 0 (same as np.abs()), when 1<ord<2, it is
NaN with plenty of warnings, which should be enough to prevent user from
doing that.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants