-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[Flaky] test_operator_gpu.test_dropout #14288
Comments
Hey, this is the MXNet Label Bot. |
@mxnet-label-bot add [Test, Flaky] |
Seeing it again: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/master/650/pipeline Making a PR to disable test until it is fixed. |
Was able to reproduce this issue.
Works correctly However, With
Fails at the assertion because the values are |
@DickJC123 @ptrendx it looks like a bug in the cudnn dropout implementation on the ratio=1 case |
Two issues. Issue 1: #14288 Issue 2: [2020-11-17T06:58:34.678Z] def check_passthrough(ratio, shape, cudnn_off=True): [2020-11-17T06:58:34.678Z] # test inference_mode forward and then backward [2020-11-17T06:58:34.678Z] a = mx.random.uniform(shape=shape) [2020-11-17T06:58:34.678Z] a.attach_grad() [2020-11-17T06:58:34.678Z] with mx.autograd.record(train_mode=False): [2020-11-17T06:58:34.678Z] b = mx.nd.Dropout(a, ratio, cudnn_off=cudnn_off) # dropout acts as identity [2020-11-17T06:58:34.678Z] b.backward() [2020-11-17T06:58:34.678Z] assert_almost_equal(a.grad.asnumpy(), mx.nd.ones_like(b).asnumpy()) [2020-11-17T06:58:34.678Z] [2020-11-17T06:58:34.678Z] shape = (100, 100) [2020-11-17T06:58:34.678Z] check_dropout_ratio(0.5, shape) [2020-11-17T06:58:34.678Z] check_dropout_ratio(0.0, shape) [2020-11-17T06:58:34.678Z] > check_dropout_ratio(1.0, shape) [...] [2020-11-17T06:58:34.678Z] # Hopefully should be within ratio/2 % [2020-11-17T06:58:34.678Z] error = abs(output_sum - input_sum) / input_sum [2020-11-17T06:58:34.678Z] if ratio == 1.0: [2020-11-17T06:58:34.678Z] > assert output_zeroes == len(input) [2020-11-17T06:58:34.678Z] E assert 9999 == 10000 [2020-11-17T06:58:34.678Z] E +9999 [2020-11-17T06:58:34.678Z] E -10000
Two issues. Issue 1: #14288 Issue 2: [2020-11-17T06:58:34.678Z] def check_passthrough(ratio, shape, cudnn_off=True): [2020-11-17T06:58:34.678Z] # test inference_mode forward and then backward [2020-11-17T06:58:34.678Z] a = mx.random.uniform(shape=shape) [2020-11-17T06:58:34.678Z] a.attach_grad() [2020-11-17T06:58:34.678Z] with mx.autograd.record(train_mode=False): [2020-11-17T06:58:34.678Z] b = mx.nd.Dropout(a, ratio, cudnn_off=cudnn_off) # dropout acts as identity [2020-11-17T06:58:34.678Z] b.backward() [2020-11-17T06:58:34.678Z] assert_almost_equal(a.grad.asnumpy(), mx.nd.ones_like(b).asnumpy()) [2020-11-17T06:58:34.678Z] [2020-11-17T06:58:34.678Z] shape = (100, 100) [2020-11-17T06:58:34.678Z] check_dropout_ratio(0.5, shape) [2020-11-17T06:58:34.678Z] check_dropout_ratio(0.0, shape) [2020-11-17T06:58:34.678Z] > check_dropout_ratio(1.0, shape) [...] [2020-11-17T06:58:34.678Z] # Hopefully should be within ratio/2 % [2020-11-17T06:58:34.678Z] error = abs(output_sum - input_sum) / input_sum [2020-11-17T06:58:34.678Z] if ratio == 1.0: [2020-11-17T06:58:34.678Z] > assert output_zeroes == len(input) [2020-11-17T06:58:34.678Z] E assert 9999 == 10000 [2020-11-17T06:58:34.678Z] E +9999 [2020-11-17T06:58:34.678Z] E -10000
http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-gpu/branches/PR-14270/runs/5/nodes/271/steps/624/log/?start=0
The text was updated successfully, but these errors were encountered: