-
Notifications
You must be signed in to change notification settings - Fork 6.8k
flaky test: test_operator_gpu.test_depthwise_convolution #12203
Comments
@lebeg Thanks for filing the issue. We will look into this issue. |
Fix in #12402 |
@mseth10 Happening again after increasing tolerance level: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-12429/1/pipeline |
As far as I know this is still an issue: |
During testing we found another failure seed:
|
But it seems that even with fixed seeds the test fails not deterministically. |
This flaky test issue has previously been identified (#8712) and fixed (#10365) for Python2: MKLDNN-CPU. During this fix (PR discussion), it was identified that this problem still exists for Python2: MKLDNN-GPU. PR #10578 supposedly fixed the issue, but as it appears, the test still fails non-deterministically. Can you please have a look at this issue? @nihui @xinyu-intel @pengzhao-intel @zheng-da |
Reproduction steps (from https://cwiki.apache.org/confluence/display/MXNET/Reproducing+test+results): Spin up a p3.8x large instance, with Ubuntu base DLAMI, with at least 150GB EBS storage Cloning and building MXNet:
Enabling the test - Comment out *line 1634 in file tests/python/unittest/test_operator.py. Running only this particular test 10,000 times - Modify *line 735 in file ci/docker/runtime_functions.sh to
*Line numbers corresponding to commit e25e18f |
@juliusshufan could you help take a look for this test case? |
@juliusshufan ping |
@mseth10 @lebeg sorry for late response. I run the test case 10000 times with same seen mentioned by the issue description, but can't be reproduced. May I have your comments? |
@juliusshufan We use dockerized builds and tests and therefore the host system shouldn't matter. You should be able to reproduce the failure by following steps mentioned by @mseth10 above: Checkout and build git clone --recursive https://github.com/apache/incubator-mxnet.git
cd incubator-mxnet
pip3 install -r ci/requirements.txt
ci/build.py --platform ubuntu_build_cuda /work/runtime_functions.sh build_ubuntu_gpu_mkldnn Enable the test Comment out this line in file tests/python/unittest/test_operator.py: # @unittest.skip("Flaky test https://github.com/apache/incubator-mxnet/issues/12203") Speed up the testing Running only this particular test 10,000 times: Modify the unittest_ubuntu_python2_gpu function in file ci/docker/runtime_functions.sh: MXNET_TEST_COUNT=10000 nosetests-2.7 $NOSE_COVERAGE_ARGUMENTS $NOSE_TIMER_ARGUMENTS --with-xunit --xunit-file nosetests_gpu.xml --verbose tests/python/gpu/test_operator_gpu.py:test_depthwise_convolution Run the test In the exact environment where it fails:
|
@juliusshufan did you try dockerized build and test commands on your system? were you able to reproduce the failure? |
Fix in #14016 |
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-12181/4/pipeline
The text was updated successfully, but these errors were encountered: