-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Is it necessary to wait CUDA stream when calling WaitToRead
or WaitToWrite
?
#12823
Comments
Thank you for posting your issue. We will look into this. @mxnet-label-bot [ Bug, Cuda] |
@piyushghai Thank you! |
MXNet calls stream->wait at operator level: https://github.com/apache/incubator-mxnet/blob/master/src/imperative/imperative_utils.h#L405 |
@eric-haibin-lin I call I will check it. |
Did you make sure the reference to the original NDArray is kept and the memory is not freed? |
@eric-haibin-lin Thank you! |
Sorry, there is a bug in my project. class CFuncDef:
[...]
def __call__(self, arg_datas, arg_types, dev_id):
if dev_id is None:
ctx = 'cpu'
else:
set_device(dev_id)
ctx = gpu_ctx_name
# function loader
func = self.loader(self, arg_types, ctx, **self.loader_kwargs)
return func(*arg_datas)
Calling Solved it. |
Good to know it's resolved. |
Description
Hi! there.
I found a problem about the asynchronous execution.
In the two functions
NDArray::WaitToRead
andNDArray::WaitToWrite
, there is no any statement to wait the CUDA stream to finish.It means that the task pushed before calling the two functions may start to execute after calling the two functions. But the task before calling the two functions should have executed before the end of calling the two functions.
In the PR [MXNET-779]Add DLPack Transformation API I submitted,
[Code]python/mxnet/ndarray/ndarray.py#L3980
After calling
MXNDArrayWaitToWrite
, there may be some task on the CUDA stream becauseWaitToWrite
andWaitToRead
don't wait the CUDA stream to finish. So the data in the DLPack may be wrong.Environment info (Required)
Package used (Python/R/Scala/Julia):
Python
For Scala user, please provide:
java -version
)mvn -version
)scala -version
)For R user, please provide R
sessionInfo()
:Build info (Required if built from source)
Compiler (gcc/clang/mingw/visual studio):
MXNet commit hash:
efa7d3a
Build config:
(Paste the content of config.mk, or the build command.)
Error Message:
(Paste the complete error message, including stack trace.)
Minimum reproducible example
(If you are using your own code, please provide a short script that reproduces the error. Otherwise, please provide link to the existing example.)
Steps to reproduce
(Paste the commands you ran that produced the error.)
What have you tried to solve it?
The text was updated successfully, but these errors were encountered: