-
Notifications
You must be signed in to change notification settings - Fork 6.8k
test_operator_gpu.test_embedding_with_type 'an illegal memory access was encountered' #17713
Comments
May be a bug in Cuda 10.0. Can't reproduce on 10.1. |
@MoisesHer Could you take a look? |
Yes, will take a look ASAP |
We have investigated this issue and we found a bug in CUDA 10.0 compiler affecting only code generated for NVIDIA Turing architecture, i.e. SM_75. We suggest to remove SM_75 architecture when building Mxnet using CUDA Toolkit 10.0, but keeping SM_70 architecture. Note that code generated for SM_70 architecture is compatible with Turing, thus Turing GPUs are able to execute this code without any problem. |
@MoisesHer thanks for investigating the issue. Could you adapt https://github.com/apache/incubator-mxnet/blob/master/config/distribution/linux_cu100.cmake#L36 accordingly and add a comment inline? |
This comment has been minimized.
This comment has been minimized.
Close as it's a cuda bug |
Description
Embedding operator in
test_operator_gpu.test_embedding_with_type
triggers illegal memory access error deterministically on G4 instance.Error Message
To Reproduce
nosetests --verbose tests/python/gpu/test_operator_gpu.py -m test_embedding_with_type
Steps to reproduce
CC=clang-9 CXX=clang++-9 cmake -GNinja -DUSE_MKLDNN=1 -DUSE_CUDA=ON .. ; ninja
nosetests --verbose --stop ../tests/python/gpu/test_operator_gpu.py -m test_embedding_with_type
Environment
The text was updated successfully, but these errors were encountered: