Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Flaky test test_operator_gpu.test_convolution_independent_gradients #15603

Open
ChaiBapchya opened this issue Jul 19, 2019 · 11 comments
Open

Flaky test test_operator_gpu.test_convolution_independent_gradients #15603

ChaiBapchya opened this issue Jul 19, 2019 · 11 comments

Comments

@ChaiBapchya
Copy link
Contributor

For unrelated PR - #15522

Here's the error log

======================================================================
FAIL: test_operator_gpu.test_convolution_independent_gradients
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Python37\lib\site-packages\nose\case.py", line 198, in runTest
    self.test(*self.arg)
  File "C:\Python37\lib\site-packages\nose\util.py", line 620, in newfunc
    return func(*arg, **kw)
  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\gpu\../unittest\common.py", line 177, in test_new
    orig_test(*args, **kwargs)
  File "C:\jenkins_slave\workspace\ut-python-gpu\tests\python\gpu\../unittest\test_operator.py", line 1989, in test_convolution_independent_gradients
    grad2[var_name].asnumpy(), rtol=rtol, atol=atol)
  File "C:\Python37\lib\site-packages\numpy\testing\_private\utils.py", line 1501, in assert_allclose
    verbose=verbose, header=header, equal_nan=equal_nan)
  File "C:\Python37\lib\site-packages\numpy\testing\_private\utils.py", line 827, in assert_array_compare
    raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.01, atol=0.1

Mismatch: 0.446%
Max absolute difference: 0.7600479
Max relative difference: 0.37727797
 x: array([[[[-3.817751e+01,  1.281843e+02, -3.195532e+01, ...,
           1.479566e+02, -3.144979e+01, -7.199168e+00],
         [-3.450592e+01,  4.369902e+01, -7.741132e+01, ...,...
 y: array([[[[-3.816167e+01,  1.281905e+02, -3.194344e+01, ...,
           1.479433e+02, -3.151185e+01, -7.199774e+00],
         [-3.448948e+01,  4.371869e+01, -7.741202e+01, ...,...
-------------------- >> begin captured logging << --------------------
common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=22873871 to reproduce.
--------------------- >> end captured logging << ---------------------

CI pipeline - http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fwindows-gpu/detail/PR-15522/7/pipeline/

@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Test, Flaky

@ChaiBapchya
Copy link
Contributor Author

@mxnet-label-bot add [Test, Flaky]

@ChaiBapchya
Copy link
Contributor Author

@ChaiBapchya
Copy link
Contributor Author

@ChaiBapchya
Copy link
Contributor Author

@ChaiBapchya
Copy link
Contributor Author

Caused by this PR #15497
@zixuanweeei @pengzhao-intel Can you please take a look?
Thanks!

@pengzhao-intel
Copy link
Contributor

thanks, @ChaiBapchya, actually GPU doesn't calculate this part accurately.
@zixuanweeei let us disable GPU test first and then file an issue for GPU.

@zixuanweeei
Copy link
Contributor

@pengzhao-intel No problem. And I will collate a new example on GPU before filing an issue.

@ChaiBapchya
Copy link
Contributor Author

@ChaiBapchya
Copy link
Contributor Author

@zixuanweeei
Copy link
Contributor

@ChaiBapchya Thanks for reporting this issue. #15661 and #15522 failed before #15631 was merged, right? Let's see whether #15631 works.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants