Flaky test: test_operator.test_op_roi_align #11064

eric-haibin-lin · 2018-05-25T17:15:34Z

======================================================================

FAIL: test_operator.test_op_roi_align

----------------------------------------------------------------------

Traceback (most recent call last):

  File "/usr/local/lib/python3.5/dist-packages/nose/case.py", line 198, in runTest

    self.test(*self.arg)

  File "/work/mxnet/tests/python/unittest/common.py", line 157, in test_new

    orig_test(*args, **kwargs)

  File "/work/mxnet/tests/python/unittest/test_operator.py", line 6170, in test_op_roi_align

    test_roi_align_value()

  File "/work/mxnet/tests/python/unittest/test_operator.py", line 6149, in test_roi_align_value

    assert np.allclose(data.grad.asnumpy(), dx, atol = 1e-6), np.abs(data.grad.asnumpy() - dx).max()

AssertionError: 1.3150275e-06

-------------------- >> begin captured logging << --------------------

common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=1619190489 to reproduce.

--------------------- >> end captured logging << ---------------------

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-11058/1/pipeline

The text was updated successfully, but these errors were encountered:

zhreshold · 2018-05-25T17:22:02Z

Relax with rtol should be fine, the diff is acceptable

haojin2 · 2018-05-25T23:00:33Z

@zhreshold increased the rtol to 1e-5 and passed 500 consecutive test runs, the change is included in #11058

anirudhacharya · 2018-06-28T02:02:21Z

This failure persists - http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/incubator-mxnet/branches/PR-11229/runs/9/nodes/752/log/?start=0

marcoabreu · 2018-06-28T07:48:42Z

Hi @anirudhacharya, please note that the problem is not this test but many tests are failing if you scroll up. This is documented in #11395

ThomasDelteil · 2018-09-13T17:21:48Z

Does it still happen? This seems different than what @anirudhacharya reported as in I cannot see more failure above that test: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-12542/3/pipeline

jlcontreras · 2018-12-05T09:41:39Z

Seems to still happen:

======================================================================
FAIL: test_operator_gpu.test_op_roi_align
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/usr/local/lib/python2.7/dist-packages/nose/util.py", line 620, in newfunc
    return func(*arg, **kw)
  File "/work/mxnet/tests/python/gpu/../unittest/common.py", line 173, in test_new
    orig_test(*args, **kwargs)
  File "/work/mxnet/tests/python/gpu/../unittest/test_operator.py", line 6994, in test_op_roi_align
    test_roi_align_value()
  File "/work/mxnet/tests/python/gpu/../unittest/test_operator.py", line 6970, in test_roi_align_value
    assert np.allclose(output.asnumpy(), real_output)
    AssertionError: 
-------------------- >> begin captured logging << --------------------
common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=35650200 to reproduce.
--------------------- >> end captured logging << ---------------------

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/master/45/pipeline

zhreshold · 2018-12-05T18:56:11Z

Maybe we should take another look, however, the log won't show the detailed error anymore?

wkcn · 2018-12-10T12:09:40Z

Maybe there is some difference between C++ implementation and Python implementation. I will check it.

There is a float precision problem.
I may have fixed it.

wkcn · 2018-12-11T03:02:49Z

In this unittest, the real_output is computed in float64, so there is some float precision problem.

When MXNET_TEST_SEED=35650200, the error shows that:

Error 4.049840 exceeds tolerance rtol=0.000010, atol=0.000000.  Location of maximum error:(6, 0, 0, 1), a=0.005887, b=0.005887
 a: array([[[[  172.60527039,   173.5171814 ,   174.42907715,   175.34100342],
         [  195.92706299,   196.83895874,   197.75085449,   198.6627655 ],
         [  219.24882507,   220.16073608,   221.07263184,   221.98455811]],...
 b: array([[[[  172.60527039,   173.5171814 ,   174.42907715,   175.34100342],
         [  195.92706299,   196.83895874,   197.75085449,   198.6627655 ],
         [  219.24884033,   220.16075134,   221.07266235,   221.98455811]],...

I think atol should be not 0.

eric-haibin-lin added Test Flaky labels May 25, 2018

haojin2 mentioned this issue May 25, 2018

[MXNET-473] Fix for dist_sync_kvstore and test_operator.test_op_roi_align #11058

Merged

5 tasks

eric-haibin-lin closed this as completed May 25, 2018

marcoabreu mentioned this issue Jun 25, 2018

Flaky test on Python2 Windows #11394

Closed

marcoabreu reopened this Jun 25, 2018

marcoabreu closed this as completed Jun 25, 2018

jlcontreras mentioned this issue Dec 5, 2018

Disable flaky test: test_op_roi_align #13546

Closed

marcoabreu reopened this Dec 5, 2018

marcoabreu added the Disabled test label Dec 5, 2018

wkcn mentioned this issue Dec 11, 2018

[MXNET-1258]fix unittest for ROIAlign Operator #13609

Merged

7 tasks

sandeep-krishnamurthy closed this as completed in #13609 Feb 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky test: test_operator.test_op_roi_align #11064

Flaky test: test_operator.test_op_roi_align #11064

eric-haibin-lin commented May 25, 2018

zhreshold commented May 25, 2018

haojin2 commented May 25, 2018 •

edited

Loading

anirudhacharya commented Jun 28, 2018

marcoabreu commented Jun 28, 2018

ThomasDelteil commented Sep 13, 2018

jlcontreras commented Dec 5, 2018

zhreshold commented Dec 5, 2018

wkcn commented Dec 10, 2018 •

edited

Loading

wkcn commented Dec 11, 2018 •

edited

Loading

Flaky test: test_operator.test_op_roi_align #11064

Flaky test: test_operator.test_op_roi_align #11064

Comments

eric-haibin-lin commented May 25, 2018

zhreshold commented May 25, 2018

haojin2 commented May 25, 2018 • edited Loading

anirudhacharya commented Jun 28, 2018

marcoabreu commented Jun 28, 2018

ThomasDelteil commented Sep 13, 2018

jlcontreras commented Dec 5, 2018

zhreshold commented Dec 5, 2018

wkcn commented Dec 10, 2018 • edited Loading

wkcn commented Dec 11, 2018 • edited Loading

haojin2 commented May 25, 2018 •

edited

Loading

wkcn commented Dec 10, 2018 •

edited

Loading

wkcn commented Dec 11, 2018 •

edited

Loading