You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
Some new test cases were added to unit test for covering the fusion routine of RNN operators in #16420. Then we found that flakiness of test_operator.py:test_rnnrelu_sym appears several times from the online CI of either Unix-GPU MKLDNN+GPU or Unix-GPU NOMKLDNN+GPU. We have no idea the root cause of the flakiness, but we can reproduce the inconsistent results locally. Please see the below parts of the details.
Environment info (Required)
----------Python Info----------
Version : 3.7.3
Compiler : GCC 7.3.0
Build : ('default', 'Mar 27 2019 22:11:17')
Arch : ('64bit', '')
------------Pip Info-----------
Version : 19.1.1
Directory : /root/miniconda3/lib/python3.7/site-packages/pip
----------MXNet Info-----------
Version : 1.6.0
Directory : /root/dev/incubator-mxnet/python/mxnet
Commit hash file "/root/dev/incubator-mxnet/python/mxnet/COMMIT_HASH" not found. Not installed from pre-built package or built from source.
Library : ['/root/dev/incubator-mxnet/lib/libmxnet.so', '/root/dev/incubator-mxnet/python/mxnet/../../lib/libmxnet.so']
Build features:
✔ CUDA
✔ CUDNN
✖ NCCL
✔ CUDA_RTC
✖ TENSORRT
✔ CPU_SSE
✔ CPU_SSE2
✔ CPU_SSE3
✔ CPU_SSE4_1
✔ CPU_SSE4_2
✖ CPU_SSE4A
✔ CPU_AVX
✖ CPU_AVX2
✔ OPENMP
✖ SSE
✔ F16C
✖ JEMALLOC
✖ BLAS_OPEN
✖ BLAS_ATLAS
✔ BLAS_MKL
✖ BLAS_APPLE
✖ LAPACK
✖ MKLDNN
✔ OPENCV
✖ CAFFE
✖ PROFILER
✖ DIST_KVSTORE
✖ CXX14
✖ INT64_TENSOR_SIZE
✖ SIGNAL_HANDLER
✖ DEBUG
✖ TVM_OP
----------System Info----------
Platform : Linux-4.18.0-15-generic-x86_64-with-debian-buster-sid
system : Linux
node : d64ced67d422
release : 4.18.0-15-generic
version : #16~18.04.1-Ubuntu SMP Thu Feb 7 14:06:04 UTC 2019
Package used (Python/R/Scala/Julia):
I'm using Python Package
How relative error is executed here which will cause the flaky report if the absolute number is very small?
The absolute difference is only 0.00012941 but relative difference is large than 1. Please check how the calculate and maybe we need a robust algorithm for the relative difference.
I just moved the unit test for rnn_relu to its original look due to the urgency of transition to mkldnn-v1.0. I have tested it on local. Some flakiness appeared with this unit test. It may pass sometimes. I am mot familiar with the GPU platform so that it needs others from their expertise.
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Description
Some new test cases were added to unit test for covering the fusion routine of RNN operators in #16420. Then we found that flakiness of
test_operator.py:test_rnnrelu_sym
appears several times from the online CI of either Unix-GPU MKLDNN+GPU or Unix-GPU NOMKLDNN+GPU. We have no idea the root cause of the flakiness, but we can reproduce the inconsistent results locally. Please see the below parts of the details.Environment info (Required)
Package used (Python/R/Scala/Julia):
I'm using Python Package
Build info (Required if built from source)
MXNet commit hash:
63fbfb1
Build config:
Error Message:
Minimum reproducible example
Steps to reproduce
data.tar.gz
The text was updated successfully, but these errors were encountered: