Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ambiguity] Does DLManagedTensor::deleter delete its function argument? #39

Closed
junrushao opened this issue May 21, 2019 · 6 comments
Closed

Comments

@junrushao
Copy link
Member

It seems to me an undefined behavior in our protocol whether the deleter below deletes self as well. The inconsistency causes a recent crash when using MXNet as a backend for DGL, reported by @zheng-da.

/*! \brief Destructor signature void (*)(void*) - this should be called
* to destruct manager_ctx which holds the DLManagedTensor. It can be NULL
* if there is no way for the caller to provide a reasonable destructor.
*/
void (*deleter)(struct DLManagedTensor * self);

I did a quick check and found that the deleters in TVM and DGL do free self, but haven't verified it on PyTorch & Chainer side. Could someone help with this issue? Thanks!

@junrushao junrushao changed the title [ambiguity] Does DLManagedTensor::deleter deletes its function argument as well? [ambiguity] Does DLManagedTensor::deleter delete its function argument? May 21, 2019
@junrushao
Copy link
Member Author

I sent a fix on the MXNet side that allows both
apache/mxnet#15016

@tqchen
Copy link
Member

tqchen commented May 21, 2019

The deleter may not its function argument, but simply implies that the DLManagedTensor no longer want the object. The behavior could be decref if self is also reference counted and shared with others, or delete self if self is a new object and DLManagedTensor is the sole owner.

To elaborate further, the deleter should be sufficient to delete self and no further deletion is needed

@junrushao
Copy link
Member Author

@tqchen I agree with your points and suggest we explicitly state that in our protocol.

The bug I introduced in MXNet is that I didn't know the deleter will free the opaque pointer, which causes a double free in DGL.

@junrushao
Copy link
Member Author

For example, the deleter here doesn't delete the given handle because the handle is transient.

@tqchen
Copy link
Member

tqchen commented May 21, 2019

a PR to clarify the protocol is more than welcomed

@junrushao
Copy link
Member Author

Will do

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants