Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

CI saw test_numpy_op.py:test_np_sum failure #16475

Closed
DickJC123 opened this issue Oct 14, 2019 · 3 comments
Closed

CI saw test_numpy_op.py:test_np_sum failure #16475

DickJC123 opened this issue Oct 14, 2019 · 3 comments
Labels

Comments

@DickJC123
Copy link
Contributor

Description

@reminisce @haojin2
Likely needs atol/rtol tweeking. I can reproduce this repeatably with:

MXNET_TEST_SEED=28297241 nosetests --verbose -s tests/python/unittest/test_numpy_op.py:test_np_sum

Seen in:
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-cpu/detail/PR-16462/5/pipeline/260/

Error output:

======================================================================

FAIL: test_numpy_op.test_np_sum

----------------------------------------------------------------------

Traceback (most recent call last):

  File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest

    self.test(*self.arg)

  File "/work/mxnet/tests/python/unittest/common.py", line 177, in test_new

    orig_test(*args, **kwargs)

  File "/work/mxnet/python/mxnet/util.py", line 307, in _with_np_shape

    return func(*args, **kwargs)

  File "/work/mxnet/python/mxnet/util.py", line 491, in _with_np_array

    return func(*args, **kwargs)

  File "/work/mxnet/tests/python/unittest/test_numpy_op.py", line 543, in test_np_sum

    numeric_eps=1e-3, rtol=1e-3, atol=1e-4, dtype=_np.float32)

  File "/work/mxnet/python/mxnet/test_utils.py", line 1017, in check_numeric_gradient

    ("NUMERICAL_%s"%name, "BACKWARD_%s"%name))

  File "/work/mxnet/python/mxnet/test_utils.py", line 535, in assert_almost_equal

    raise AssertionError(msg)

AssertionError: 

Items are not equal:

Error 1.242137 exceeds tolerance rtol=0.001000, atol=0.000100.  Location of maximum error:(0, 0, 0, 0), a=0.920296, b=0.921565

 NUMERICAL_x: array([[[[0.9202957, 0.9202957, 0.9202957],

         [0.9202957, 0.9202957, 0.9202957],

         [0.9202957, 0.9202957, 0.9202957]],...

 BACKWARD_x: array([[[[0.92156464, 0.92156464, 0.92156464],

         [0.92156464, 0.92156464, 0.92156464],

         [0.92156464, 0.92156464, 0.92156464]],...

-------------------- >> begin captured logging << --------------------

root: INFO: NumPy-shape semantics has been activated in your code. This is required for creating and manipulating scalar and zero-size tensors, which were not supported in MXNet before, as in the official NumPy library. Please DO NOT manually deactivate this semantics while using `mxnet.numpy` and `mxnet.numpy_extension` modules.

common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=28297241 to reproduce.

Test otherwise seems somewhat solid- I could not get another failure with 1000 random seeds.

@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended label(s): Test, CI

@haojin2
Copy link
Contributor

haojin2 commented Oct 14, 2019

This happened in numerical gradient check, which is a bit unstable by nature. I've personally already tested with 10000 trials and did not get a failure back then... and yes, I think we should raise the tolerance level a little bit to account this kind of worst-case scenario, I'll mix this into one of my recent PRs #16436

@haojin2
Copy link
Contributor

haojin2 commented Nov 18, 2019

#16436 merged with atol bumped up, not seeing any new reports thus closing the issue now.

@haojin2 haojin2 closed this as completed Nov 18, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants