Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[Flaky] test_operator_gpu.test_dropout #14288

Open
junrushao opened this issue Feb 28, 2019 · 6 comments
Open

[Flaky] test_operator_gpu.test_dropout #14288

junrushao opened this issue Feb 28, 2019 · 6 comments

Comments

@junrushao
Copy link
Member

http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-gpu/branches/PR-14270/runs/5/nodes/271/steps/624/log/?start=0

======================================================================
FAIL: test_operator_gpu.test_dropout
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/usr/local/lib/python2.7/dist-packages/nose/util.py", line 620, in newfunc
    return func(*arg, **kw)
  File "/work/mxnet/tests/python/gpu/../unittest/common.py", line 173, in test_new
    orig_test(*args, **kwargs)
  File "/work/mxnet/tests/python/gpu/../unittest/test_operator.py", line 6107, in test_dropout
    check_dropout_ratio(1.0, shape, cudnn_off=False)
  File "/work/mxnet/tests/python/gpu/../unittest/test_operator.py", line 6040, in check_dropout_ratio
    check_correctness(exe, exe.arg_arrays[0].asnumpy(), ratio)
  File "/work/mxnet/tests/python/gpu/../unittest/test_operator.py", line 6005, in check_correctness
    assert output_zeroes == len(input)
AssertionError: 
-------------------- >> begin captured logging << --------------------
common: INFO: Setting test np/mx/python random seeds, use MXNET_TEST_SEED=623337751 to reproduce.
--------------------- >> end captured logging << ---------------------
@mxnet-label-bot
Copy link
Contributor

Hey, this is the MXNet Label Bot.
Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it.
Here are my recommended labels: Test, Flaky

@junrushao junrushao changed the title Flaky test: test_operator_gpu.test_dropout [Flaky] test_operator_gpu.test_dropout Feb 28, 2019
@junrushao
Copy link
Member Author

@mxnet-label-bot add [Test, Flaky]

@perdasilva
Copy link
Contributor

Seeing it again: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/master/650/pipeline

Making a PR to disable test until it is fixed.

@ChaiBapchya
Copy link
Contributor

Was able to reproduce this issue.
With cudnn_off=True,

check_dropout_ratio(1.0, shape)

Works correctly
assert output_zeroes == len(input) is correct (both values 10000)

However, With cudnn_off=False i.e. with cudnn

check_dropout_ratio(1.0, shape)

Fails at the assertion because the values are
output_zeroes 9999
input 10000

@eric-haibin-lin
Copy link
Member

@DickJC123 @ptrendx it looks like a bug in the cudnn dropout implementation on the ratio=1 case

leezu added a commit that referenced this issue Nov 17, 2020
Two issues.

Issue 1: #14288

Issue 2:

[2020-11-17T06:58:34.678Z]         def check_passthrough(ratio, shape, cudnn_off=True):
[2020-11-17T06:58:34.678Z]             # test inference_mode forward and then backward
[2020-11-17T06:58:34.678Z]             a = mx.random.uniform(shape=shape)
[2020-11-17T06:58:34.678Z]             a.attach_grad()
[2020-11-17T06:58:34.678Z]             with mx.autograd.record(train_mode=False):
[2020-11-17T06:58:34.678Z]                 b = mx.nd.Dropout(a, ratio, cudnn_off=cudnn_off) # dropout acts as identity
[2020-11-17T06:58:34.678Z]             b.backward()
[2020-11-17T06:58:34.678Z]             assert_almost_equal(a.grad.asnumpy(), mx.nd.ones_like(b).asnumpy())
[2020-11-17T06:58:34.678Z]     
[2020-11-17T06:58:34.678Z]         shape = (100, 100)
[2020-11-17T06:58:34.678Z]         check_dropout_ratio(0.5, shape)
[2020-11-17T06:58:34.678Z]         check_dropout_ratio(0.0, shape)
[2020-11-17T06:58:34.678Z] >       check_dropout_ratio(1.0, shape)
[...]
[2020-11-17T06:58:34.678Z]         # Hopefully should be within ratio/2 %
[2020-11-17T06:58:34.678Z]         error = abs(output_sum - input_sum) / input_sum
[2020-11-17T06:58:34.678Z]         if ratio == 1.0:
[2020-11-17T06:58:34.678Z] >           assert output_zeroes == len(input)
[2020-11-17T06:58:34.678Z] E           assert 9999 == 10000
[2020-11-17T06:58:34.678Z] E             +9999
[2020-11-17T06:58:34.678Z] E             -10000
leezu added a commit that referenced this issue Nov 25, 2020
Two issues.

Issue 1: #14288

Issue 2:

[2020-11-17T06:58:34.678Z]         def check_passthrough(ratio, shape, cudnn_off=True):
[2020-11-17T06:58:34.678Z]             # test inference_mode forward and then backward
[2020-11-17T06:58:34.678Z]             a = mx.random.uniform(shape=shape)
[2020-11-17T06:58:34.678Z]             a.attach_grad()
[2020-11-17T06:58:34.678Z]             with mx.autograd.record(train_mode=False):
[2020-11-17T06:58:34.678Z]                 b = mx.nd.Dropout(a, ratio, cudnn_off=cudnn_off) # dropout acts as identity
[2020-11-17T06:58:34.678Z]             b.backward()
[2020-11-17T06:58:34.678Z]             assert_almost_equal(a.grad.asnumpy(), mx.nd.ones_like(b).asnumpy())
[2020-11-17T06:58:34.678Z]     
[2020-11-17T06:58:34.678Z]         shape = (100, 100)
[2020-11-17T06:58:34.678Z]         check_dropout_ratio(0.5, shape)
[2020-11-17T06:58:34.678Z]         check_dropout_ratio(0.0, shape)
[2020-11-17T06:58:34.678Z] >       check_dropout_ratio(1.0, shape)
[...]
[2020-11-17T06:58:34.678Z]         # Hopefully should be within ratio/2 %
[2020-11-17T06:58:34.678Z]         error = abs(output_sum - input_sum) / input_sum
[2020-11-17T06:58:34.678Z]         if ratio == 1.0:
[2020-11-17T06:58:34.678Z] >           assert output_zeroes == len(input)
[2020-11-17T06:58:34.678Z] E           assert 9999 == 10000
[2020-11-17T06:58:34.678Z] E             +9999
[2020-11-17T06:58:34.678Z] E             -10000
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants