-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Flaky test test_gluon_rnn.test_layer_bidirectional #13103
Comments
Do you think the rule is too strict? |
@mxnet-label-bot [Gluon, Flaky, Test] |
I'm currently working on some CD pipelines and I'm seeing this issue crop up: I'll create a quick PR to disable it. |
@rongzha1 please take a look this issue. |
@perdasilva could you elaborate on the settings for these pipelines? Are they failing because of CPU tests or GPU tests? |
@perdasilva Seems like the mismatch rate is very low (0.xxx%) while rtol=1e-7 and atol is even just 0, I wonder if we could simply bump the tolerances up instead of disabling this? |
@szha in the two cases I've linked to, it was tested against a binary compiled with your tools for static linking, and the variants used were cu80mkl and cu92mkl. @haojin2 I'm happy to bump them, but I just wouldn't know what to bump them to =S I'm not familiar with this side of the code and don't really know what reasonable tolerance levels would be. |
@perdasilva Even rtol=2e-7 would suffice and please try 10000 times for that particular test. If you don't know how to do it I can run it on my side. |
@haojin2 I'll give it a go, and let you know how it goes. Thanks for the help! |
@haojin2 no good:
What I did to the test code:
To test the changes, I have a g3.8xlarge instance with nvidia drivers 418 and nvidia-docker:
|
Could you try |
I'll close my skip_test PR and post my fix test PR =) |
@pengzhao-intel forgot to say thank you. Thank you! =D |
Example failure: https://travis-ci.org/apache/incubator-mxnet/builds/450064964?utm_source=github_status&utm_medium=notification
======================================================================
FAIL: test_gluon_rnn.test_layer_bidirectional
Traceback (most recent call last):
File "/usr/local/Cellar/numpy/1.14.5/libexec/nose/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/Users/travis/build/apache/incubator-mxnet/tests/python/unittest/common.py", line 106, in test_new
orig_test(*args, **kwargs)
File "/Users/travis/build/apache/incubator-mxnet/tests/python/unittest/test_gluon_rnn.py", line 282, in test_layer_bidirectional
assert_allclose(net(data).asnumpy(), ref_net(data).asnumpy())
File "/usr/local/lib/python2.7/site-packages/numpy/testing/nose_tools/utils.py", line 1396, in assert_allclose
verbose=verbose, header=header, equal_nan=equal_nan)
File "/usr/local/lib/python2.7/site-packages/numpy/testing/nose_tools/utils.py", line 779, in assert_array_compare
raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=1e-07, atol=0
(mismatch 0.0649350649351%)
x: array([[[0.682853, 0.674969, 0.547395, ..., 0.997481, 0.998059
0.994295]
[0.652577, 0.653787, 0.478821, ..., 0.997182, 0.996606,...
y: array([[[0.682853, 0.674969, 0.547395, ..., 0.997481, 0.998059
0.994295]
[0.652577, 0.653787, 0.478821, ..., 0.997182, 0.996606,...
The text was updated successfully, but these errors were encountered: