-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Converting MX array to DLPack crashes when MX array goes out-of-scope #13658
Comments
Hey, this is the MXNet Label Bot. |
DLPack allow strides field to be However, PyTorch doesn't accept the DLPack whose strides is |
the error might be trickier. For DGL code below, the code is undeterministic. We have to run it multiple times before we can see crash. It seems that some memory in dlpack exported from MXNet isn't referenced. However, if I use import os
os.environ['DGLBACKEND'] = 'mxnet'
import mxnet as mx
import numpy as np
import dgl
def foo():
x = mx.nd.array([0, 5], dtype='int64')
dl = x.to_dlpack_for_read()
return dgl.ndarray.from_dlpack(dl)
for i in range(10):
y = foo()
y.asnumpy() |
@wkcn This explains the torch case thank you. In DGL, we actually handled this: |
@jermainewang Let me test it for TVM. import tvm
import mxnet as mx
tvm_a = tvm.ndarray.array([1, 2, 3])
tvm_pack = tvm_a.to_dlpack()
mx_a = mx.nd.from_dlpack(tvm_pack)
print(mx_a)
mx_b = mx.nd.array([4, 5, 6])
mx_pack = mx_b.to_dlpack_for_write()
tvm_b = tvm.nd.from_dlpack(mx_pack)
print(tvm_b)
It works fine in TVM. |
@wkcn , try following steps:
import mxnet as mx
import numpy as np
import tvm
def foo():
x = mx.nd.array([0, 5], dtype='int64')
dl = x.to_dlpack_for_read()
return tvm.nd.from_dlpack(dl)
for i in range(10):
y = foo()
y.asnumpy()
I used a ubuntu docker image and could reproduce the error. |
@jermainewang I will test it on the ubuntu server and docker. |
it seems the bug only appears in Ubuntu 16.04, if I remember correctly. We tested in Ubuntu 18.04, and it works fine. |
Yeah, the bug does not occur on my arch machine too. |
Error message:
|
Reproduce the error on Ubuntu16.04 inline size_t GetDataSize(const DLTensor& arr) {
size_t size = 1;
cout << "DLTensor Ptr: " << &arr << endl;
cout << "Shape Ptr: " << arr.shape << endl;
cout << "ndim: " << arr.ndim << endl;
for (tvm_index_t i = 0; i < arr.ndim; ++i) {
cout << "shape[" << i << "] = " << arr.shape[i] << endl;
size *= static_cast<size_t>(arr.shape[i]);
}
cout << "Bits: " << int(arr.dtype.bits) << endl;
cout << "lanes: " << int(arr.dtype.lanes) << endl;
size *= (arr.dtype.bits * arr.dtype.lanes + 7) / 8;
return size;
} Error Message:
In MXNet, inline void SetDLTensor(int dev_mask, int dev_id) {
dltensor_.data = dptr_;
dltensor_.ctx = DLContext{static_cast<DLDeviceType>(dev_mask), dev_id};
dltensor_.ndim = shape_.ndim();
dltensor_.dtype = DTypeTransform(type_flag_);
dltensor_.shape = shape_.data();
dltensor_.strides = nullptr;
dltensor_.byte_offset = 0;
} It seems that |
That's strange. I thought the mutable is for the data pointer while the shape array should not be changed. |
I have fixed the bug in PR #13698 |
Description
Converting MX NDArray to DLPack, then to other framework's DLPack-compatible NDArray causes memory corruption when the origin MX NDArray goes out-of-scope.
Environment info (Required)
Package used (Python/R/Scala/Julia): Python
Error Message:
Minimum reproducible example
Torch version v1.0.0
Steps to reproduce
(Paste the commands you ran that produced the error.)
What have you tried to solve it?
Found this bug in DGL project dmlc/dgl#312 . Tried:
The text was updated successfully, but these errors were encountered: