-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wrappers_test.py TimeDistributed may be buggy #7033
Conversation
Here are examples of it flaking: |
Still flaky... Perhaps the assertion doesn't work the way it appears to?
|
…ensions are the same
First run succeeded, closing and reopening to ensure it does not flake. |
tests/keras/layers/wrappers_test.py
Outdated
@@ -97,7 +97,7 @@ def test_TimeDistributed_learning_phase(): | |||
y = wrappers.TimeDistributed(core.Dropout(.999))(x, training=True) | |||
model = Model(x, y) | |||
y = model.predict(np.random.random((10, 3, 2))) | |||
assert_allclose(0., y, atol=1e-1, rtol=1e-1) | |||
assert_allclose(y, np.zeros(y.shape), atol=1e-1, rtol=1e-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order convention is expected, actual
. Changing the shape does not affect flakiness (or anything else) since numpy automatically broadcasts. To make the test less flaky you can either increase the dropout or reduce the tolerance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you might have it backwards, numpy docs for assert_allclose quoted here says: numpy.testing.assert_allclose(actual, desired, rtol=1e-07, atol=0, equal_nan=True, err_msg='', verbose=True)
.
A better check would be with completely fixed random seeds for dropout so the same things are dropped every time. Then dropout could also be 0.5. I think TF can do that, can theano?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a better plan would be to test the value of the average, and use a stricter tolerance. We have a tensor of shape (10, 3, 2)
that should be 99.9% 0s, so its average should be very close to zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's still flaky... just less flaky. Eventually it will be guaranteed to fail.
Moved to fixed seed, ran the algorithm, printed the result, and put that printed result into the code to confirm it stays close to the newly fixed expected value. |
Added a separate fix for theano/tensorflow since they have different RNGs. I think this solves the flakiness problem while still performing the desired test. |
I would propose this simpler solution, which also tests that the dropout layer was in fact applied at predict time:
|
I'd expect that implementation to flake a bit too often as well if I mentally work out the probability over 1k test runs, why not 0 flakes? The numpy array is there and constant so I can add a check for dropout too, all values should be zero or the original array value. If you're still not convinced I'll put it as you prefer. :-) |
Because of the seed. If it works once on each backend, then it will always work. And it will likely work once since the failure probability is very low. |
…eam#7033 This may now be highlighting an actual bug.
Okay I did that & just increased the size to over 1 million total entries to make sure it is very unlikely to ever flake, but this seems to have run into an actual bug.
|
Closing because: 1) PR is stale, 2) this test isn't related to a bug fix or a change in logic. Tests are meant to be a way to monitor the impact and correctness of future changes in the codebase, and in this regard I don't think this test provides useful monitoring information. |
This PR was still open because it is a suspected bug for which no fix exists, perhaps this is expected behavior or I misunderstood something? I tried running the following again and it still fails 100% of the time, when I expected the flake rate to be around 1 in 1 million: @keras_test
@pytest.mark.skipif((K.backend() == 'cntk'),
reason='cntk does not support dropout yet')
def test_TimeDistributed_learning_phase_big():
# test layers that need learning_phase to be set
np.random.seed(1234)
width = 105
height = 101
time = 102
x = Input(shape=(width, height))
y = wrappers.TimeDistributed(core.Dropout(.999))(x, training=True)
model = Model(x, y)
y = model.predict(np.random.random((time, width, height)))
np.testing.assert_allclose(np.mean(y), 0., atol=1e-1, rtol=1e-1) From the run:
|
This adds a test that illustrates a bug in TimeDistributed, where dropout does not appear to match the expected behavior for inputs with large dimensions. The following test fails:
This PR was originally created for the following flaky test:
a625fcd#diff-a60adf725df8a5eed11441fb09ae7fb0R98