Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

test_operator.test_laop_3 has fixed seed that can mask flakiness #11720

Open
szha opened this issue Jul 13, 2018 · 11 comments
Open

test_operator.test_laop_3 has fixed seed that can mask flakiness #11720

szha opened this issue Jul 13, 2018 · 11 comments

Comments

@szha
Copy link
Member

szha commented Jul 13, 2018

The unit test in title have been using fixed seed to mask flakiness. Suggested action:

  1. Evaluate whether the test is flaky without fixed seed. If not, remove seed. Else move to 2
  2. If test is flaky, determine whether it's an actual uncaught edge case. If so, fix the operator. Else move to 3
  3. If numerical instability is inevitable, adjust tolerance level appropriately.
@apeforest
Copy link
Contributor

@szha Thanks for filing this issue. We will investigate these Flaky tests.

@apeforest
Copy link
Contributor

Thannks for filing this issue. We will investigate this Flaky test

@frankfliu
Copy link
Contributor

The issue can be reproduced:
MXNET_TEST_SEED=333426306 nosetests --logging-level=DEBUG --verbose -s tests/python/unittest/test_operator.py:test_laop_3

AssertionError:
Items are not equal:
Error 2056487.529996 exceeds tolerance rtol=0.010000, atol=0.010000. Location of maximum error:(0, 10), a=20571.134836, b=0.000304
NUMERICAL_data1: array([[-20571.10127895, -20571.16447819, -20571.15281231, ...,
-20571.16473405, 20571.14681937, -20571.09714863],
[-20571.16447819, -20571.14299957, -20571.0931439 , ...,...
BACKWARD_data1: array([[ 0.03820337, -0.02174861, -0.01172468, ..., -0.03253193,
0.00363456, 0.04317206],
[-0.02174861, -0.01322614, 0.03689246, ..., -0.01726587,...

@srochel
Copy link
Contributor

srochel commented Oct 21, 2018

@mxnet-label-bot [Good First Issue]

@larroy
Copy link
Contributor

larroy commented Jul 22, 2019

@ChaiBapchya
Copy link
Contributor

ChaiBapchya commented Jul 22, 2019

@ChaiBapchya
Copy link
Contributor

larroy added a commit to larroy/mxnet that referenced this issue Aug 6, 2019
Fixes apache#11720
Overall will reduce flakiness of tests using numerical gradients
larroy added a commit to larroy/mxnet that referenced this issue Aug 6, 2019
Fixes apache#11720
Overall will reduce flakiness of tests using numerical gradients
@ChaiBapchya
Copy link
Contributor

ChaiBapchya commented Aug 9, 2019

Another one

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fwindows-gpu/detail/PR-15807/2/pipeline

#15807

test_operator.test_laop_6 ... 
Error running unittest, python exited with status code C0000005

At C:\jenkins_slave\workspace\ut-python-gpu\ci\windows\test_py2_gpu.ps1:27 

char:13

+ if (! $?) { Throw ("Error running unittest, python exited with status ...

+             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

    + CategoryInfo          : OperationStopped: (Error running u...s code C000 

   0005:String) [], RuntimeException

    + FullyQualifiedErrorId : Error running unittest, python exited with statu 

   s code C0000005

@marcoabreu
Copy link
Contributor

laop_3 and laop_6 are different tests, so I would differentiate if possible.

anirudhacharya pushed a commit to anirudhacharya/mxnet that referenced this issue Oct 25, 2020
Fixes apache#11720
Overall will reduce flakiness of tests using numerical gradients
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants