-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[bug] Autograd throws an exception that was not caught in MXNet 1.6 #18789
Comments
@sxjscience , @eric-haibin-lin do you have any idea on what happened? I met the same problem on an AWS p3 instance, and the error message persists after I upgrade my MXNet to 2.0 version. |
I think the numpy array support in autograd.function was missed. #18790 |
@szha Thanks for your help, but I think the problem also exists for |
Did you run the above as a script? Or in an interactive python shell? I want to see if the program terminated immediately after the execution. |
I ran the above code as a script. >>> from mxnet import nd
>>> import mxnet as mx
>>> x = mx.np.zeros((10,))
>>> x.attach_grad()
>>> class Op(mx.autograd.Function):
... def forward(self, x):
... out = x + 1
... return out
... def backward(self, grad):
... grad_x = grad
... return grad_x
...
>>> op = Op()
>>> with mx.autograd.record():
... y = op(x)
... y.sum().backward()
...
>>> print(x.grad)
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
>>>
>>>
Error in sys.excepthook:
Original exception was: |
#18768 probably have fixed it. Would you try the nightly build and see if this is still an issue? |
@szha no, the problem still exists. |
In [3]: from mxnet import nd
...: import mxnet as mx
...: x = mx.np.zeros((10,))
...: x.attach_grad()
...: mx.npx.set_np()
...: class Op(mx.autograd.Function):
...: def forward(self, x):
...: out = x + 1
...: return out
...:
...: def backward(self, grad):
...: grad_x = grad
...: return grad_x
...:
...: op = Op()
...: with mx.autograd.record():
...: y = op(x)
...: y.sum().backward()
...:
...: print(x.grad)
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
In [4]:
Do you really want to exit ([y]/n)? |
|
Actually I cannot reproduce this error. @yzh119 Would you try again with the latest nightly version and with the following code snippet? To install the nightly version:
from mxnet import nd
import mxnet as mx
mx.npx.set_np()
x = mx.np.zeros((10,))
x.attach_grad()
class Op(mx.autograd.Function):
def forward(self, x):
out = x + 1
return out
def backward(self, grad):
grad_x = grad
return grad_x
op = Op()
with mx.autograd.record():
y = op(x)
y.sum().backward()
print(x.grad) |
@sxjscience using the nightly build version does not work either:
|
Which OS are you using?
Get Outlook for iOS<https://aka.ms/o0ukef>
…________________________________
From: Zihao Ye <[email protected]>
Sent: Thursday, August 20, 2020 8:47:58 PM
To: apache/incubator-mxnet <[email protected]>
Cc: Xingjian SHI <[email protected]>; Mention <[email protected]>
Subject: Re: [apache/incubator-mxnet] [bug] Autograd throws an exception that was not caught in MXNet 1.6 (#18789)
@sxjscience<https://github.com/sxjscience> using the nightly build version does not work either:
[11:46:12] ../src/storage/storage.cc:198: Using Pooled (Naive) StorageManager for CPU
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
Error in sys.excepthook:
Original exception was:
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#18789 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABHQH3WWJ27WPOD5DTEEBL3SBXU65ANCNFSM4PHIU3WQ>.
|
I confirmed that it won't happen if you just run it inside jupyter notebook and will only happen if you save to |
@sxjscience sounds like a release order problem again. Will take a look |
I didn't find a way to get access to any actual error and despite the message the program exited normally. |
I could not reproduce it when using MXNet (only-cpu) 1.6 and 2.0 on Arch Linux, even if running BTW, I used python 3.8.5. I reproduced it on Ubuntu, Python 3.8.3, MXNet (only cpu) 1.6/2.0. |
I was only able to reproduce the problem on mac on python3.7 and not on python3.8
|
Description
The Autograd module throws an exception that was not caught:
After the execution of the program.
To Reproduce
Below is a minimal example to reproduce the bug:
However, it could successfully print the result
Environment
The text was updated successfully, but these errors were encountered: