Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Flaky test test_operator_gpu.test_sparse_dot #10920

Closed
eric-haibin-lin opened this issue May 13, 2018 · 6 comments · Fixed by #12527
Closed

Flaky test test_operator_gpu.test_sparse_dot #10920

eric-haibin-lin opened this issue May 13, 2018 · 6 comments · Fixed by #12527
Assignees

Comments

@eric-haibin-lin
Copy link
Member

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-10913/7/pipeline

======================================================================

FAIL: test_operator_gpu.test_sparse_dot

----------------------------------------------------------------------

Traceback (most recent call last):

  File "/usr/local/lib/python3.5/dist-packages/nose/case.py", line 198, in runTest

    self.test(*self.arg)

  File "/usr/local/lib/python3.5/dist-packages/nose/util.py", line 620, in newfunc

    return func(*arg, **kw)

  File "/work/mxnet/tests/python/gpu/../unittest/common.py", line 157, in test_new

    orig_test(*args, **kwargs)

  File "/work/mxnet/tests/python/gpu/../unittest/test_sparse_operator.py", line 1343, in test_sparse_dot

    lhs_d, rhs_d, False, True)

  File "/work/mxnet/tests/python/gpu/../unittest/test_sparse_operator.py", line 1237, in test_infer_forward_stype

    assert_almost_equal(out.tostype('default').asnumpy(), out_np, rtol=1e-4, atol=1e-5)

  File "/work/mxnet/python/mxnet/test_utils.py", line 493, in assert_almost_equal

    raise AssertionError(msg)

AssertionError: 

Items are not equal:

Error 1.067252 exceeds tolerance rtol=0.000100, atol=0.000010.  Location of maximum error:(34, 15), a=0.022217, b=0.022204

 a: array([[ -9.000863  ,   4.9705057 ,  -2.7022123 , ..., -10.717851  ,

         19.614717  ,  17.951117  ],

       [-19.35049   ,   2.4999516 ,  -7.9741106 , ...,  15.310856  ,...

 b: array([[ -9.000862  ,   4.9705014 ,  -2.7022119 , ..., -10.717848  ,

         19.614723  ,  17.951107  ],

       [-19.350492  ,   2.4999518 ,  -7.9741096 , ...,  15.310851  ,...

@haojin2

@haojin2
Copy link
Contributor

haojin2 commented May 13, 2018

Should be a rtol/atol problem, will fix this ASAP

@haojin2
Copy link
Contributor

haojin2 commented May 15, 2018

Met this again:
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-10931/7/pipeline

======================================================================

FAIL: test_operator_gpu.test_sparse_dot

----------------------------------------------------------------------

Traceback (most recent call last):

  File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest

    self.test(*self.arg)

  File "/usr/local/lib/python2.7/dist-packages/nose/util.py", line 620, in newfunc

    return func(*arg, **kw)

  File "/work/mxnet/tests/python/gpu/../unittest/common.py", line 157, in test_new

    orig_test(*args, **kwargs)

  File "/work/mxnet/tests/python/gpu/../unittest/test_sparse_operator.py", line 1343, in test_sparse_dot

    lhs_d, rhs_d, False, True)

  File "/work/mxnet/tests/python/gpu/../unittest/test_sparse_operator.py", line 1237, in test_infer_forward_stype

    assert_almost_equal(out.tostype('default').asnumpy(), out_np, rtol=1e-4, atol=1e-5)

  File "/work/mxnet/python/mxnet/test_utils.py", line 493, in assert_almost_equal

    raise AssertionError(msg)

AssertionError: 

Items are not equal:

Error 1.045963 exceeds tolerance rtol=0.000100, atol=0.000010.  Location of maximum error:(1, 3), a=0.018232, b=0.018245

 a: array([[  4.6516194,  -1.9191772, -12.820089 , ...,   8.844241 ,

        -13.252303 , -12.635316 ],

       [ -4.1542487,   7.7496295,  -0.1049877, ...,   4.691146 ,...

 b: array([[  4.65162   ,  -1.919177  , -12.820091  , ...,   8.844247  ,

        -13.252301  , -12.635329  ],

       [ -4.154251  ,   7.749628  ,  -0.10498619, ...,   4.6911426 ,...

-------------------- >> begin captured logging << --------------------

common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=33958090 to reproduce.

--------------------- >> end captured logging << ---------------------

@eric-haibin-lin
Copy link
Member Author

======================================================================

FAIL: test_operator_gpu.test_sparse_dot

----------------------------------------------------------------------

Traceback (most recent call last):

  File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest

    self.test(*self.arg)

  File "/usr/local/lib/python2.7/dist-packages/nose/util.py", line 620, in newfunc

    return func(*arg, **kw)

  File "/work/mxnet/tests/python/gpu/../unittest/common.py", line 157, in test_new

    orig_test(*args, **kwargs)

  File "/work/mxnet/tests/python/gpu/../unittest/test_sparse_operator.py", line 1385, in test_sparse_dot

    lhs_d, rhs_d, False, False)

  File "/work/mxnet/tests/python/gpu/../unittest/test_sparse_operator.py", line 1281, in test_infer_forward_stype

    assert_almost_equal(out.tostype('default').asnumpy(), out_np, rtol=1e-4, atol=1e-5)

  File "/work/mxnet/python/mxnet/test_utils.py", line 493, in assert_almost_equal

    raise AssertionError(msg)

AssertionError: 

Items are not equal:

Error 1.040007 exceeds tolerance rtol=0.000100, atol=0.000010.  Location of maximum error:(41, 6), a=-0.002038, b=-0.002048

 a: array([[-17.577438  ,  -1.1808437 ,  13.078005  , ...,  -3.15891   ,

         -6.6346364 ,  -1.1883267 ],

       [ 14.642713  ,  -4.0540075 ,  15.738339  , ..., -14.301232  ,...

 b: array([[-17.577438  ,  -1.1808434 ,  13.078006  , ...,  -3.1589074 ,

         -6.63463   ,  -1.1883259 ],

       [ 14.642715  ,  -4.0540066 ,  15.738338  , ..., -14.301228  ,...

-------------------- >> begin captured logging << --------------------


http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-11360/8/pipeline/731

@anirudh2290
Copy link
Member

assigned to haibin. @haojin2 has a PR open for this.

@haojin2
Copy link
Contributor

haojin2 commented Jun 29, 2018

@eric-haibin-lin should be fixed at this time.

@zheng-da
Copy link
Contributor

It seems that the previous PR doesn't fix the problem completely.

http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-11641/24/pipeline

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants