Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[MXNET-779]Add DLPack Transformation API #12047

Merged
merged 31 commits into from
Sep 22, 2018
Merged
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
822706e
add dlpack convertor api
wkcn Aug 3, 2018
8aac3da
Merge branch 'master' of https://github.com/apache/incubator-mxnet in…
wkcn Aug 3, 2018
ab6fa85
add to_dlpack and from_dlpack for NDArray
wkcn Aug 6, 2018
8c6e9d2
fix dlpack deleter and add unittest for dlpack
wkcn Aug 6, 2018
9fdfa7d
Merge branch 'master' of https://github.com/apache/incubator-mxnet in…
wkcn Aug 6, 2018
1142787
update 3rdparty
wkcn Aug 6, 2018
16df8d5
fix for cpplint
wkcn Aug 6, 2018
bfcffa2
fix pylint and add destructor for dlpack
wkcn Aug 6, 2018
f5c2552
fix pylint in base.py
wkcn Aug 6, 2018
98b5d11
fix lint in base.py
wkcn Aug 6, 2018
7bdde8f
add document for DLPack transformation API
wkcn Aug 6, 2018
f225d27
add to_dlpack_for_read and to_dlpack_for_write
wkcn Aug 7, 2018
afc1518
fix lint for ndarray.py and fix typo in c_api.h
wkcn Aug 7, 2018
8b397fd
fix function name error in c_api
wkcn Aug 7, 2018
d48074a
update code indent in tensor_blob.h ans c_api.cc, remove unused type …
wkcn Aug 7, 2018
58c5d87
use MXNDArrayToDLPack in c_api and add compactness check in TBlob
wkcn Aug 9, 2018
72edbf8
merge master and fix merge conflict
wkcn Aug 11, 2018
ef8ffcd
use python function as destructor of DLPack
wkcn Aug 11, 2018
afa1898
remove unused PyObjectHandle and update DLDataTypeTransform
wkcn Aug 11, 2018
a4d3aee
update from_dlpack code
wkcn Aug 11, 2018
493deb0
fix pylint in ndarray.py
wkcn Aug 11, 2018
adf36ef
rename dlpack after using it
wkcn Aug 12, 2018
26db4d0
merge master
wkcn Aug 13, 2018
dec838d
DLManagedTensor manages itself
wkcn Aug 22, 2018
850c3dc
add deleter for TBlob and Chunk in NDArray
wkcn Aug 22, 2018
fc99323
remove used code in python/mxnet/base.py
wkcn Aug 22, 2018
ffe60c6
retrigger CI
wkcn Aug 22, 2018
cbb17c3
add deleter for shared_ptr<Chunk>
wkcn Sep 10, 2018
e56be1f
Merge branch 'master' into DLPack-convertor-API
wkcn Sep 10, 2018
b1204bc
compilation okay
wkcn Sep 10, 2018
fe1387f
fix cpplint
wkcn Sep 10, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions include/mxnet/c_api.h
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,10 @@ typedef void *CudaModuleHandle;
typedef void *CudaKernelHandle;
/*! \brief handle to a Profile object (domain, duration, counter, etc.) */
typedef void *ProfileHandle;
/*! \brief handle to DLManagedTensor*/
typedef void *DLManagedTensorHandle;
/*! \brief handle to PyObject*/
typedef void *PyObjectHandle;

typedef void (*ExecutorMonitorCallback)(const char*,
NDArrayHandle,
Expand Down Expand Up @@ -737,6 +741,57 @@ MXNET_DLL int MXNDArrayGetShape(NDArrayHandle handle,
*/
MXNET_DLL int MXNDArrayGetData(NDArrayHandle handle,
void **out_pdata);
/*!
* \brief Create a reference view of NDArray that
* represents as DLManagedTensor until
* all the pending writes with respect NDArray are finished.
* \param handle the handle to the ndarray
* \param out_dlpack pointer holder to get pointer of DLManagedTensor
* \return 0 when success, -1 when failure happens
*/
MXNET_DLL int MXNDArrayToDLPackForRead(NDArrayHandle handle,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From API point of view, we can just expose ToDLPack, and in the python API, explicitly call wait_for_read and wait_for_write

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will modify it.

DLManagedTensorHandle *out_dlpack);

/*!
* \brief Create a reference view of NDArray that
* represents as DLManagedTensor until
* all the pending reads/writes with respect NDArray are finished.
* \param handle the handle to the ndarray
* \param out_dlpack pointer holder to get pointer of DLManagedTensor
* \return 0 when success, -1 when failure happens
*/
MXNET_DLL int MXNDArrayToDLPackForWrite(NDArrayHandle handle,
DLManagedTensorHandle *out_dlpack);

/*!
* \brief Create a NDArray backed by a dlpack tensor.
*
* This allows us to create a NDArray using the memory
* allocated by an external deep learning framework
* that is DLPack compatible.
*
* The memory is retained until the NDArray went out of scope.
*
* \param dlpack the pointer of the input DLManagedTensor
* \param out_handle pointer holder to get pointer of NDArray
* \return 0 when success, -1 when failure happens
*/
MXNET_DLL int MXNDArrayFromDLPack(DLManagedTensorHandle dlpack,
NDArrayHandle *out_handle);
/*!
* \brief Delete a dlpack tensor
* \param dlpack the pointer of the input DLManagedTensor
* \return 0 when success, -1 when failure happens
*/
MXNET_DLL int MXNDArrayCallDLPackDeleter(DLManagedTensorHandle dlpack);

/*!
* \brief Delete a dlpack tensor
* \param dlpack_capsule the pointer of a PyCapsule storing DLManagedTensor
* \return 0 when success, -1 when failure happens
*/
MXNET_DLL void MXNDArrayCallDLPackCapsuleDeleter(PyObjectHandle dlpack_capsule);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am less certain why we need the deleter function here, can they be directly handled in the python/cython side?

Copy link
Member Author

@wkcn wkcn Aug 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to implement a deleter function in python, however the deleter function may be released by Python GC before calling the deleter function. See the test Code. It will raise segmentation fault.
The Python Frontend of MXNet both uses Python(ctypes) and Cython. It may be impossible to implement the deleter function in ctypes.
So the deleter function should be implemented in C++.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at destructor at apache/tvm#1573

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some subtlty here but they can never-the-less be implemented

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ever tried to write a python function as the destructor, but it can't pass CI.
Please see the MXNet CI result
All windows test_dlpack failed because the destructor written in Python is released before calling it.

PyTorch implemented the destructor using Python API in C++, and CuPy implemented it by cython, namely the code will be built by C++.
However, MXNet uses ctypes and cython. I couldn't find a better way to implement the destructor except writing it in MXNet C++ API.

Copy link
Member Author

@wkcn wkcn Aug 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I knew the trick and tried it in my previous PR. But it failed in Windows Test.
Related CI

It seems that the CI of TVM doesn't have Windows Test so the CI is passed.
The reason is that the destructor will be released by Python GC before calling it.
And the GC release order are different between Linux and Windows.

In Linux, the destructor is called first, then the destructor is released. So it works.
However, In Windows, the destructor is released first before calling it, it doesn't work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is strange as destructor itself sits in the global scope and should be destructed after the dltensors(which have a local scope)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.
In the test code, it works in Linux but failed in Windows.

Copy link
Member

@tqchen tqchen Aug 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see two problems in your particular gist you paste.

  • The destructor need to be declared in the global scope(instead of constructing when passing to the argument)
  • THe cstring need to outlive the capsule(construct a global string)
  • The function need to outlive the capsule(constructor c func and put it under global scope/module)
cfunc = ctypes.CFUNCTYPE(None, ctypes.c_void_p)

def dfunc(dltensor):
    pycaps = ctypes.cast(dltensor, ctypes.py_object)
    pass

c_destructor = cfunc(dfunc)
c_str_dltensor = ctypes.c_char_p(b"dltensor")

def test():
    a = ctypes.pythonapi.PyCapsule_New(1, c_str_dltensor, c_destructor)
test()

Copy link
Member Author

@wkcn wkcn Aug 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!
I found it works on Windows and Linux.
I have updated the PR.


/*!
* \brief get the type of the data in NDArray
* \param handle the handle to the narray
Expand Down
20 changes: 20 additions & 0 deletions include/mxnet/ndarray.h
Original file line number Diff line number Diff line change
Expand Up @@ -519,6 +519,26 @@ class NDArray {
return ret;
}

/*!
* \brief Create a reference view of NDArray that
* represents as DLManagedTensor.
* \return A DLManagedTensor
*/
DLManagedTensor* ToDLPack() const;

/*!
* \brief Create a NDArray backed by a dlpack tensor.
*
* This allows us to create a NDArray using the memory
* allocated by an external deep learning framework
* that is DLPack compatible.
*
* The memory is retained until the NDArray went out of scope.
*
* \return The created NDArray view.
*/
static NDArray FromDLPack(DLManagedTensor* tensor);

/*!
* \brief Update ndarray chunk storage handles using existing ndarray storage handles
* Also update the aux_handle, aux_shapes and aux_types.
Expand Down
47 changes: 46 additions & 1 deletion include/mxnet/tensor_blob.h
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,14 @@ class TBlob {
: dptr_(dptr), shape_(shape), type_flag_(type_flag) {
SetDLTensor(dev_mask, dev_id);
}
/*!
* \brief constructor that construct TBlob from DLTensor
* \param DLTensor Object
*/
explicit TBlob(const DLTensor &dltensor) : dptr_(dltensor.data),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to add compactness check

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, TBlob only support compact tensors, need to check strides == null or the strides reflect a compact setting

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I will move the strides check from ndarray.cpp to tensor_blob.h.

shape_(TShape(dltensor.shape, dltensor.shape + dltensor.ndim)),
type_flag_(DLDataTypeTransform(dltensor.dtype)), dltensor_(dltensor) {
}
/*!
* \brief constructor from tensor
* \param src source tensor
Expand Down Expand Up @@ -336,14 +344,51 @@ class TBlob {
}
}
}
static int DLDataTypeTransform(DLDataType dldata_type) {
if (dldata_type.lanes != 1) {
LOG(FATAL) << "Unsupported DLDataType whose lanes != 1";
}
switch (dldata_type.code) {
case kDLFloat:
switch (dldata_type.bits) {
case 16:
return mshadow::kFloat16;
case 32:
return mshadow::kFloat32;
case 64:
return mshadow::kFloat64;
}
break;
case kDLUInt:
switch (dldata_type.bits) {
case 8:
return mshadow::kUint8;
}
break;
case kDLInt:
switch (dldata_type.bits) {
case 8:
return mshadow::kInt8;
case 32:
return mshadow::kInt32;
case 64:
return mshadow::kInt64;
}
break;
}
LOG(FATAL) << "Unknown DLDataType{" << dldata_type.code
<< ", " << dldata_type.bits
<< ", " << dldata_type.lanes << "}";
return mshadow::kFloat32;
}

inline void SetDLTensor(int dev_mask, int dev_id) {
dltensor_.data = dptr_;
dltensor_.ctx = DLContext{static_cast<DLDeviceType>(dev_mask), dev_id};
dltensor_.ndim = shape_.ndim();
dltensor_.dtype = DTypeTransform(type_flag_);
dltensor_.shape = shape_.data();
dltensor_.strides = NULL;
dltensor_.strides = nullptr;
dltensor_.byte_offset = 0;
}

Expand Down
7 changes: 5 additions & 2 deletions python/mxnet/_ctypes/ndarray.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,21 +31,24 @@

class NDArrayBase(object):
"""Base data structure for ndarray"""
__slots__ = ["handle", "writable"]
__slots__ = ["handle", "writable", "dlpack"]
# pylint: disable= no-member

def __init__(self, handle, writable=True):
def __init__(self, handle, writable=True, dlpack=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dlpack should not be part of the member, the PyCapsule manages itself

Copy link
Member Author

@wkcn wkcn Aug 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dlpack in NDArray is PyCapsule which is the return value of to_dlpack_for_read/write.
When calling FromDLPack function, NDArray need to manage the release of NDArrayDLManager::handle in ndarray.cc.
e.g.

a = mx.nd.array([1,2,3]) # Denote TBlob x to store the data
pack = a.to_dlpack_for_write()
b = mx.nd.from_dlpack(pack)
del a, pack
# a and PyCapsule pack has been released.
# b need to manage the release TBlob x.

NDArray doesn't have the deleter function, so I made dlpack as a member of NDArray.
When the reference count of dlpack in NDArray is zero, the TBlob will be released.
Is there any other way to keep the reference?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better way is to keep NDArray's shared_ptr inside the manager_ctx itself, you can take a look at TVM's NDArray to DLManagedTesnor impl

Copy link
Member Author

@wkcn wkcn Aug 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NDArray in MXNet and TVM are different. NDArray in TVM has the function IncRef and DecRef to change the reference count, however that in MXNet uses NDArray::ptr_ (std::shared_ptr) to manage the reference count. NDArray::ptr_ is a private member of NDArray.
The PR is similar to the PyTorch DLPack implementation Code
I add a NDArrayDLManager to manage the reference count of NDArray. Code line 315 in src/ndarray/ndarray.cc

Setting dlpack as the NDArray(Python Class) member is to avoid the release of the store of data, e.g. When the original NDArray and the PyCapsule (DLPack) are release, the new NDArray (generated by from_dlpack) still exists.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can create a new NDArray() that copies the original NDArray(which increases refcount) and put that as a context

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In your case, when a get deleted, b still holds a NDArrayDLManager, which is allocated by new, and that object still hold NDArray(which holds a shared_ptr), so the original resource won't be released

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to be careful to use shape from the same NDArray in your NDArrayDLManager

Copy link
Member Author

@wkcn wkcn Aug 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.
But I think the new NDArray object b couldn't hold a shared_ptr as the same as the original NDArray a. The new NDArray b only get the data pointer rather than shared_ptr from DLPack.

In the other case,

from torch.utils import dlpack
a = torch.array([1,2,3])
pack = dlpack.to_dlpack(a)
b = mx.nd.from_dlpack(pack)
del a, pack

When dlpack.to_dlpack is called, PyTorch will allocate ATenDLMTensor which increases the refcount of Torch Tensor code.
After the variables a and pack are released, ATenDLMTensor still exists.
I think the deleter should be called by the new NDArray b when the NDArray b releases. Refer to PyTorch FromDLPack. However, NDArray doesn't have explicit deleter parameter.

In my PR, from_dlpack will copy the dlpack object.
When the old dlpack pack releases, it doesn't call the deleter.
The new dlpack b.dlpack will be a member of the new NDArray b as NDArray(handle=handle, dlpack=dlpack_copy).
When the new NDArray b releases, the new dlpack b.dlpack will be released, then call the deleter by the new dlpack b.dlpack. And the deleter will release NDArrayDLManager or ATenDLMTensor. The refcount of the old NDArray a will decrease 1.

Copy link
Member

@tqchen tqchen Aug 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you copy the NDArray, they hold the same shared_ptr to the data, note that shared_ptr can be copied, and its ref counter is automatically managed

Copy link
Member Author

@wkcn wkcn Aug 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a copy of NDArray as the member of NDArrayDLManager, and the copy increase the refcount.
I'm confused how to modify the PR.
After creating a new NDArray(Python) from DLPack, then delete the old NDArray(Python) and PyCapsule(DLPack).

Which object will call the deleter function?

In my case, when a gets deleted, how does b hold the NDArrayDLManager? It seems that b only get the pure data pointer from DLManagedTensor::dl_tensor::data, the type of the pointer is not shared pointer.
And how does b store the pointer to NDArrayDLManager in MXNet NDArray?
@tqchen

"""initialize a new NDArray

Parameters
----------
handle : NDArrayHandle
NDArray handle of C API
dlpack : PyCapsule (DLPack)
DLPack Object
"""
if handle is not None:
assert isinstance(handle, NDArrayHandle)
self.handle = handle
self.writable = writable
self.dlpack = dlpack

def __del__(self):
check_call(_LIB.MXNDArrayFree(self.handle))
Expand Down
14 changes: 14 additions & 0 deletions python/mxnet/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -235,6 +235,7 @@ def _load_lib():
CudaModuleHandle = ctypes.c_void_p
CudaKernelHandle = ctypes.c_void_p
ProfileHandle = ctypes.c_void_p
DLPackHandle = ctypes.c_void_p


#----------------------------
Expand Down Expand Up @@ -729,3 +730,16 @@ def write_all_str(module_file, module_all_list):
module_op_file.close()
write_all_str(module_internal_file, module_internal_all)
module_internal_file.close()

ctypes.pythonapi.PyCapsule_New.restype = ctypes.py_object
ctypes.pythonapi.PyCapsule_New.argtypes = [ctypes.c_void_p, ctypes.c_char_p,
ctypes.c_void_p]

ctypes.pythonapi.PyCapsule_GetPointer.restype = ctypes.c_void_p
ctypes.pythonapi.PyCapsule_GetPointer.argtypes = [ctypes.py_object, ctypes.c_char_p]

ctypes.pythonapi.PyCapsule_SetName.restype = ctypes.c_int
ctypes.pythonapi.PyCapsule_SetName.argtypes = [ctypes.py_object, ctypes.c_char_p]

_LIB.MXNDArrayCallDLPackCapsuleDeleter.restype = None
_LIB.MXNDArrayCallDLPackCapsuleDeleter.argtypes = [ctypes.c_void_p]
4 changes: 3 additions & 1 deletion python/mxnet/cython/ndarray.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ cdef class NDArrayBase:
# handle for symbolic operator.
cdef NDArrayHandle chandle
cdef int cwritable
cdef object dlpack

cdef _set_handle(self, handle):
cdef unsigned long long ptr
Expand All @@ -52,9 +53,10 @@ cdef class NDArrayBase:
def __get__(self):
return bool(self.cwritable)

def __init__(self, handle, writable=True):
def __init__(self, handle, writable=True, dlpack=None):
self._set_handle(handle)
self.cwritable = writable
self.dlpack = dlpack

def __dealloc__(self):
CALL(MXNDArrayFree(self.chandle))
Expand Down
Loading