Disables flaky test test_operator_gpu.test_deconvolution #14146

perdasilva · 2019-02-13T17:48:00Z

Description

Disables flaky test
Relates to #10973

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Disables test_operator_gpu.test_deconvolution test

Comments

======================================================================

FAIL: test_operator_gpu.test_deconvolution

----------------------------------------------------------------------

Traceback (most recent call last):

  File "C:\Anaconda3\envs\py3\lib\site-packages\nose\case.py", line 197, in runTest

    self.test(*self.arg)

  File "C:\Anaconda3\envs\py3\lib\site-packages\nose\util.py", line 620, in newfunc

    return func(*arg, **kw)

  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\gpu\../unittest\common.py", line 173, in test_new

    orig_test(*args, **kwargs)

  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\gpu\../unittest\test_operator.py", line 1413, in test_deconvolution

    pad                 = (1,1)

  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\gpu\../unittest\test_operator.py", line 1305, in check_deconvolution_forward_backward

    assert_almost_equal(out + args_grad_addto_npy[0], args_grad_addto[0].asnumpy(), rtol=1e-3, atol=1e-3)

  File "C:\jenkins_slave\workspace\ut-python-gpu\windows_package\python\mxnet\test_utils.py", line 495, in assert_almost_equal

    raise AssertionError(msg)

AssertionError: 

Items are not equal:

Error 1.290297 exceeds tolerance rtol=0.001000, atol=0.001000.  Location of maximum error:(31, 2, 7, 11), a=0.181059, b=0.182585

 a: array([[[[  71.53303956,   27.22900263,  -81.94053338, ...,  131.83555384,

             0.14714987, -125.67693774],

         [  71.65034077, -127.16477248,   97.06501721, ...,  224.75550074,...

 b: array([[[[  71.53299713,   27.22902107,  -81.94055176, ...,  131.83551025,

             0.14712699, -125.6769104 ],

         [  71.65033722, -127.1647644 ,   97.06500244, ...,  224.75546265,...

-------------------- >> begin captured logging << --------------------

common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1614629591 to reproduce.

--------------------- >> end captured logging << ---------------------

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fwindows-gpu/detail/PR-14144/6/pipeline

perdasilva · 2019-02-13T18:51:45Z

@mxnet-label-bot add [pr-awaiting-review]

perdasilva · 2019-02-13T18:53:01Z

@mxnet-label-bot add [flaky, test]

apeforest · 2019-02-15T18:27:52Z

Have you run the tests with master branch?

perdasilva · 2019-02-16T10:03:35Z

Not specifically. But it was detected in a PR with unrelated changes. My understanding is that the PR jobs take the current master, apply the changes and run validation against that. So, in a way, they were run against master - at the time of that PR.

apeforest

I think this is not a very strict way to identify a flaky test. A flaky test should be the one that fails in the master branch. Any PR no matter how seemlingly unrelated, could have potential impact to the existing code. That's why we have CI in place. Reject this PR unless the test failure could be reproduced on master

perdasilva · 2019-02-18T20:08:21Z

I think as a general statement, what you are saying makes sense. But in this case (#14144) - which I would suggest you look at - the changes were to the test script for the installation documentation. I find it difficult to believe that a bash script run by CI would trigger the (flaky) test_operator_gpu.test_deconvolution. Especially since, a) the PR was merged after running the job a second time (on the same code) and ended up green and b) the error report from the jenkins job (bellow) has all the hallmarks of a flaky test, namely miniscule floating point variations.

Items are not equal:

Error 1.290297 exceeds tolerance rtol=0.001000, atol=0.001000.  Location of maximum error:(31, 2, 7, 11), a=0.181059, b=0.182585

 a: array([[[[  71.53303956,   27.22900263,  -81.94053338, ...,  131.83555384,
             0.14714987, -125.67693774],
         [  71.65034077, -127.16477248,   97.06501721, ...,  224.75550074,...

 b: array([[[[  71.53299713,   27.22902107,  -81.94055176, ...,  131.83551025,
             0.14712699, -125.6769104 ],
         [  71.65033722, -127.1647644 ,   97.06500244, ...,  224.75546265,...

I would argue that the CI is in place to catch potential errors of merging you code in to the code base. If it catches an error on one run, but not on a second re-run of the exact same code, then the unreliably failing test is flaky by definition and CI isn't doing its job properly.

Given all the data above, I don't feel the need to spend time reproducing this in master.

apeforest · 2019-02-21T19:00:13Z

@perdasilva Thanks for your detailed explanation. I am not disagreeing with your analysis. However, such analysis based on individual's expertise case by case IMHO is not a scalable mechanism for a software release and deployment process. On the other hand, even if this PR passes and fails CI tests at random times, we cannot exclude the possibility that the flaskiness was introduced by this particular PR. The most reliable way to monitor flaky tests should be and only be through master branch over a period of time.

CI is our last guard to protect the quality of our software. I think it's better to be more conservative than optimistic.

perdasilva · 2019-02-22T13:55:41Z

@apeforest sure, no problems. Let's kick the ball down the road.

perdasilva changed the title ~~disable test test_operator_gpu.test_deconvolution~~ Disables flaky test test_operator_gpu.test_deconvolution Feb 13, 2019

marcoabreu added the pr-awaiting-review PR is waiting for code review label Feb 13, 2019

marcoabreu added Flaky Test labels Feb 13, 2019

perdasilva force-pushed the disable_flaky_gpu_test_deconvolution branch from 0dd80b5 to 6b90641 Compare February 13, 2019 20:43

disable test test_operator_gpu.test_deconvolution

b27a6b1

perdasilva force-pushed the disable_flaky_gpu_test_deconvolution branch from 6b90641 to b27a6b1 Compare February 14, 2019 12:31

apeforest suggested changes Feb 18, 2019

View reviewed changes

perdasilva closed this Feb 22, 2019

perdasilva deleted the disable_flaky_gpu_test_deconvolution branch February 22, 2019 13:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disables flaky test test_operator_gpu.test_deconvolution #14146

Disables flaky test test_operator_gpu.test_deconvolution #14146

perdasilva commented Feb 13, 2019 •

edited

Loading

perdasilva commented Feb 13, 2019

perdasilva commented Feb 13, 2019

apeforest commented Feb 15, 2019

perdasilva commented Feb 16, 2019

apeforest left a comment

perdasilva commented Feb 18, 2019 •

edited

Loading

apeforest commented Feb 21, 2019 •

edited

Loading

perdasilva commented Feb 22, 2019

Disables flaky test test_operator_gpu.test_deconvolution #14146

Disables flaky test test_operator_gpu.test_deconvolution #14146

Conversation

perdasilva commented Feb 13, 2019 • edited Loading

Description

Checklist

Essentials

Changes

Comments

perdasilva commented Feb 13, 2019

perdasilva commented Feb 13, 2019

apeforest commented Feb 15, 2019

perdasilva commented Feb 16, 2019

apeforest left a comment

Choose a reason for hiding this comment

perdasilva commented Feb 18, 2019 • edited Loading

apeforest commented Feb 21, 2019 • edited Loading

perdasilva commented Feb 22, 2019

perdasilva commented Feb 13, 2019 •

edited

Loading

perdasilva commented Feb 18, 2019 •

edited

Loading

apeforest commented Feb 21, 2019 •

edited

Loading