Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wrappers_test.py TimeDistributed may be buggy #7033

Closed
wants to merge 6 commits into from

Conversation

ahundt
Copy link
Contributor

@ahundt ahundt commented Jun 19, 2017

This adds a test that illustrates a bug in TimeDistributed, where dropout does not appear to match the expected behavior for inputs with large dimensions. The following test fails:

@keras_test
@pytest.mark.skipif((K.backend() == 'cntk'),
                    reason='cntk does not support dropout yet')
def test_TimeDistributed_learning_phase():
    # test layers that need learning_phase to be set
    np.random.seed(1234)
    width = 105
    height = 101
    time = 102
    x = Input(shape=(width, height))
    y = wrappers.TimeDistributed(core.Dropout(.999))(x, training=True)
    model = Model(x, y)
    y = model.predict(np.random.random((time, width, height)))
    np.testing.assert_allclose(np.mean(y), 0., atol=1e-1, rtol=1e-1)

This PR was originally created for the following flaky test:
a625fcd#diff-a60adf725df8a5eed11441fb09ae7fb0R98

@ahundt
Copy link
Contributor Author

ahundt commented Jun 19, 2017

@ahundt
Copy link
Contributor Author

ahundt commented Jun 19, 2017

Still flaky... Perhaps the assertion doesn't work the way it appears to?


_____________________ test_TimeDistributed_learning_phase ______________________
[gw1] linux -- Python 3.5.3 /home/travis/miniconda/envs/test-environment/bin/python
@keras_test
    @pytest.mark.skipif((K.backend() == 'cntk'),
                        reason='cntk does not support dropout yet')
    def test_TimeDistributed_learning_phase():
        # test layers that need learning_phase to be set
        x = Input(shape=(3, 2))
        y = wrappers.TimeDistributed(core.Dropout(.999))(x, training=True)
        model = Model(x, y)
        y = model.predict(np.random.random((10, 3, 2)))
>       assert_allclose(0., y, atol=1e-1, rtol=1e-1)
E       AssertionError: 
E       Not equal to tolerance rtol=0.1, atol=0.1
E       
E       (mismatch 1.6666666666666714%)
E        x: array(0.0)
E        y: array([[[   0.      ,    0.      ],
E               [   0.      ,    0.      ],
E               [   0.      ,    0.      ]],...
tests/keras/layers/wrappers_test.py:100: AssertionError

@ahundt ahundt changed the title wrappers_test.py fix tolerance in def test_TimeDistributed_learning_phase() wrappers_test.py test_TimeDistributed_learning_phase is flaky Jun 19, 2017
@ahundt
Copy link
Contributor Author

ahundt commented Jun 19, 2017

First run succeeded, closing and reopening to ensure it does not flake.

@ahundt ahundt closed this Jun 19, 2017
@ahundt ahundt reopened this Jun 19, 2017
@@ -97,7 +97,7 @@ def test_TimeDistributed_learning_phase():
y = wrappers.TimeDistributed(core.Dropout(.999))(x, training=True)
model = Model(x, y)
y = model.predict(np.random.random((10, 3, 2)))
assert_allclose(0., y, atol=1e-1, rtol=1e-1)
assert_allclose(y, np.zeros(y.shape), atol=1e-1, rtol=1e-1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The order convention is expected, actual. Changing the shape does not affect flakiness (or anything else) since numpy automatically broadcasts. To make the test less flaky you can either increase the dropout or reduce the tolerance.

Copy link
Contributor Author

@ahundt ahundt Jun 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might have it backwards, numpy docs for assert_allclose quoted here says: numpy.testing.assert_allclose(actual, desired, rtol=1e-07, atol=0, equal_nan=True, err_msg='', verbose=True).

A better check would be with completely fixed random seeds for dropout so the same things are dropped every time. Then dropout could also be 0.5. I think TF can do that, can theano?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better plan would be to test the value of the average, and use a stricter tolerance. We have a tensor of shape (10, 3, 2) that should be 99.9% 0s, so its average should be very close to zero.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's still flaky... just less flaky. Eventually it will be guaranteed to fail.

@ahundt
Copy link
Contributor Author

ahundt commented Jun 19, 2017

Moved to fixed seed, ran the algorithm, printed the result, and put that printed result into the code to confirm it stays close to the newly fixed expected value.

@ahundt
Copy link
Contributor Author

ahundt commented Jun 19, 2017

Added a separate fix for theano/tensorflow since they have different RNGs. I think this solves the flakiness problem while still performing the desired test.

@fchollet
Copy link
Collaborator

I would propose this simpler solution, which also tests that the dropout layer was in fact applied at predict time:

np.random.seed(1234)
...
x = Input(shape=(10, 10))
...
y = wrappers.TimeDistributed(core.Dropout(.999))(x, training=True)
...
y = model.predict(np.random.random((10, 10, 10)))
assert_allclose(0., np.average(y), atol=1e-2)

@ahundt
Copy link
Contributor Author

ahundt commented Jun 20, 2017

I'd expect that implementation to flake a bit too often as well if I mentally work out the probability over 1k test runs, why not 0 flakes?

The numpy array is there and constant so I can add a check for dropout too, all values should be zero or the original array value. If you're still not convinced I'll put it as you prefer. :-)

@fchollet
Copy link
Collaborator

why not 0 flakes?

Because of the seed. If it works once on each backend, then it will always work. And it will likely work once since the failure probability is very low.

@ahundt
Copy link
Contributor Author

ahundt commented Jun 20, 2017

Okay I did that & just increased the size to over 1 million total entries to make sure it is very unlikely to ever flake, but this seems to have run into an actual bug.

================================= FAILURES ==================================
____________________ test_TimeDistributed_learning_phase ____________________
[gw0] darwin -- Python 2.7.13 /usr/local/opt/python/bin/python2.7
@keras_test
    @pytest.mark.skipif((K.backend() == 'cntk'),
                        reason='cntk does not support dropout yet')
    def test_TimeDistributed_learning_phase():
        # test layers that need learning_phase to be set
        width = 105
        height = 101
        time = 102
        x = Input(shape=(width, height))
        y = wrappers.TimeDistributed(core.Dropout(.999))(x, training=True)
        model = Model(x, y)
        y = model.predict(np.random.random((time, width, height)))
>       np.testing.assert_allclose(np.average(y), 0., atol=1e-1, rtol=1e-1)
E       AssertionError:
E       Not equal to tolerance rtol=0.1, atol=0.1
E
E       (mismatch 100.0%)
E        x: array(0.48270708322525024, dtype=float32)
E        y: array(0.0)

@ahundt ahundt changed the title wrappers_test.py test_TimeDistributed_learning_phase is flaky wrappers_test.py TimeDistributed may be buggy Jun 21, 2017
@fchollet
Copy link
Collaborator

Closing because: 1) PR is stale, 2) this test isn't related to a bug fix or a change in logic. Tests are meant to be a way to monitor the impact and correctness of future changes in the codebase, and in this regard I don't think this test provides useful monitoring information.

@fchollet fchollet closed this Aug 15, 2017
@ahundt
Copy link
Contributor Author

ahundt commented Aug 17, 2017

This PR was still open because it is a suspected bug for which no fix exists, perhaps this is expected behavior or I misunderstood something?

I tried running the following again and it still fails 100% of the time, when I expected the flake rate to be around 1 in 1 million:

@keras_test
@pytest.mark.skipif((K.backend() == 'cntk'),
                    reason='cntk does not support dropout yet')
def test_TimeDistributed_learning_phase_big():
    # test layers that need learning_phase to be set
    np.random.seed(1234)
    width = 105
    height = 101
    time = 102
    x = Input(shape=(width, height))
    y = wrappers.TimeDistributed(core.Dropout(.999))(x, training=True)
    model = Model(x, y)
    y = model.predict(np.random.random((time, width, height)))
    np.testing.assert_allclose(np.mean(y), 0., atol=1e-1, rtol=1e-1)

From the run:

/usr/local/bin/py.test tests/keras/layers/wrappers_test.py
================================== test session starts ==================================
platform linux2 -- Python 2.7.12, pytest-3.2.1, py-1.4.34, pluggy-0.4.0 -- /usr/bin/python
cachedir: .cache
rootdir: /home/ahundt/src/keras, inifile: pytest.ini
collected 5 items                                                                        

tests/keras/layers/wrappers_test.py::test_TimeDistributed PASSED
tests/keras/layers/wrappers_test.py::test_TimeDistributed_learning_phase_big FAILED
tests/keras/layers/wrappers_test.py::test_TimeDistributed_learning_phase PASSED
tests/keras/layers/wrappers_test.py::test_regularizers PASSED
tests/keras/layers/wrappers_test.py::test_Bidirectional PASSED

=============================== slowest 10 test durations ===============================
5.53s call     tests/keras/layers/wrappers_test.py::test_Bidirectional
3.33s call     tests/keras/layers/wrappers_test.py::test_TimeDistributed
0.06s call     tests/keras/layers/wrappers_test.py::test_regularizers
0.04s call     tests/keras/layers/wrappers_test.py::test_TimeDistributed_learning_phase_big
0.02s call     tests/keras/layers/wrappers_test.py::test_TimeDistributed_learning_phase
0.00s setup    tests/keras/layers/wrappers_test.py::test_TimeDistributed
0.00s setup    tests/keras/layers/wrappers_test.py::test_TimeDistributed_learning_phase_big
0.00s setup    tests/keras/layers/wrappers_test.py::test_Bidirectional
0.00s teardown tests/keras/layers/wrappers_test.py::test_regularizers
0.00s setup    tests/keras/layers/wrappers_test.py::test_TimeDistributed_learning_phase
======================================= FAILURES ========================================
________________________ test_TimeDistributed_learning_phase_big ________________________

    @keras_test
    @pytest.mark.skipif((K.backend() == 'cntk'),
                        reason='cntk does not support dropout yet')
    def test_TimeDistributed_learning_phase_big():
        # test layers that need learning_phase to be set
        np.random.seed(1234)
        width = 105
        height = 101
        time = 102
        x = Input(shape=(width, height))
        y = wrappers.TimeDistributed(core.Dropout(.999))(x, training=True)
        model = Model(x, y)
        y = model.predict(np.random.random((time, width, height)))
>       np.testing.assert_allclose(np.mean(y), 0., atol=1e-1, rtol=1e-1)
E       AssertionError: 
E       Not equal to tolerance rtol=0.1, atol=0.1
E       
E       (mismatch 100.0%)
E        x: array(0.520167350769043, dtype=float32)
E        y: array(0.0)

tests/keras/layers/wrappers_test.py:126: AssertionError
--------------------------------- Captured stderr call ----------------------------------
2017-08-16 20:16:13.456284: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:02:00.0)
=================================== warnings summary ====================================
tests/keras/layers/wrappers_test.py::test_TimeDistributed
  /home/ahundt/.local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py:95: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
    "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "

-- Docs: http://doc.pytest.org/en/latest/warnings.html
==================== 1 failed, 4 passed, 1 warnings in 10.50 seconds ====================
-> [1]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants