-
Notifications
You must be signed in to change notification settings - Fork 6.8k
TBlob bug about dltensor #15931
Comments
Hey, this is the MXNet Label Bot. |
@mxnet-label-bot add [bug] |
I too encountered the illegal-memory-access error probably resulted from the root cause revealed here (I called |
I created a fix in #15937 (comment) by overriding assignment operator and copy constructor with SetDLTensor() explicitly called, as @reminisce suggested. |
Fixed by #15937 . Issue closed. |
Description
TBlob does not disable/overload the default copy constructor/assignment, so the default one can be used. This results in shallow copy of dltensor_ (which is a field of type DLTensor in TBlob, see here) and memory leak.
Environment info (Required)
Python 3.7.3
Built from source (master at 5a4c01b)
Minimum reproducible example
To reproduce this error, I made a minor change to the function NumpyDotForward (in src/operator/numpy/np_dot-inl.h) for illustration.
Here is the function after my modification.
I modified one line, and added two lines (denoted by comments):
Steps to reproduce
The expected result is
But the real result is
Possible cause of this problem
TBlob.dltensor_.shape is a pointer. When TBlob b is assigned to TBlob a, the pointer gets shallow copied:
But b.dltensor_.shape points to b.shape_.data(). So when b is a temporary variable (like the return value of TBlob.reshape()), b.shape_.data() gets destroyed after the function returns. Now a.dltensor_.shape points to invalid memory.
Quick fix (IMO)
Comments
This bug has nothing to do with np.dot. I just used it for illustration.
Thank @yzhliu @reminisce @haojin2 for help.
The text was updated successfully, but these errors were encountered: