wrappers_test.py TimeDistributed may be buggy #7033

ahundt · 2017-06-19T04:48:56Z

This adds a test that illustrates a bug in TimeDistributed, where dropout does not appear to match the expected behavior for inputs with large dimensions. The following test fails:

@keras_test
@pytest.mark.skipif((K.backend() == 'cntk'),
                    reason='cntk does not support dropout yet')
def test_TimeDistributed_learning_phase():
    # test layers that need learning_phase to be set
    np.random.seed(1234)
    width = 105
    height = 101
    time = 102
    x = Input(shape=(width, height))
    y = wrappers.TimeDistributed(core.Dropout(.999))(x, training=True)
    model = Model(x, y)
    y = model.predict(np.random.random((time, width, height)))
    np.testing.assert_allclose(np.mean(y), 0., atol=1e-1, rtol=1e-1)

This PR was originally created for the following flaky test:
a625fcd#diff-a60adf725df8a5eed11441fb09ae7fb0R98

…hase() keras-team@a625fcd#diff-a60adf725df8a5eed11441fb09ae7fb0R98

ahundt · 2017-06-19T04:50:54Z

Here are examples of it flaking:

travis flaking in Input Tensors: High Performance Large Datasets via TFRecords #6928 (comment)
travis flaking issue Fix the ordering bugs when using pickle_safe=True #6891

ahundt · 2017-06-19T05:54:06Z

Still flaky... Perhaps the assertion doesn't work the way it appears to?


_____________________ test_TimeDistributed_learning_phase ______________________
[gw1] linux -- Python 3.5.3 /home/travis/miniconda/envs/test-environment/bin/python
@keras_test
    @pytest.mark.skipif((K.backend() == 'cntk'),
                        reason='cntk does not support dropout yet')
    def test_TimeDistributed_learning_phase():
        # test layers that need learning_phase to be set
        x = Input(shape=(3, 2))
        y = wrappers.TimeDistributed(core.Dropout(.999))(x, training=True)
        model = Model(x, y)
        y = model.predict(np.random.random((10, 3, 2)))
>       assert_allclose(0., y, atol=1e-1, rtol=1e-1)
E       AssertionError: 
E       Not equal to tolerance rtol=0.1, atol=0.1
E       
E       (mismatch 1.6666666666666714%)
E        x: array(0.0)
E        y: array([[[   0.      ,    0.      ],
E               [   0.      ,    0.      ],
E               [   0.      ,    0.      ]],...
tests/keras/layers/wrappers_test.py:100: AssertionError

…ensions are the same

ahundt · 2017-06-19T17:22:38Z

First run succeeded, closing and reopening to ensure it does not flake.

fchollet · 2017-06-19T17:26:18Z

tests/keras/layers/wrappers_test.py

@@ -97,7 +97,7 @@ def test_TimeDistributed_learning_phase():
    y = wrappers.TimeDistributed(core.Dropout(.999))(x, training=True)
    model = Model(x, y)
    y = model.predict(np.random.random((10, 3, 2)))
-    assert_allclose(0., y, atol=1e-1, rtol=1e-1)
+    assert_allclose(y, np.zeros(y.shape), atol=1e-1, rtol=1e-1)


The order convention is expected, actual. Changing the shape does not affect flakiness (or anything else) since numpy automatically broadcasts. To make the test less flaky you can either increase the dropout or reduce the tolerance.

I think you might have it backwards, numpy docs for assert_allclose quoted here says: numpy.testing.assert_allclose(actual, desired, rtol=1e-07, atol=0, equal_nan=True, err_msg='', verbose=True).

A better check would be with completely fixed random seeds for dropout so the same things are dropped every time. Then dropout could also be 0.5. I think TF can do that, can theano?

I think a better plan would be to test the value of the average, and use a stricter tolerance. We have a tensor of shape (10, 3, 2) that should be 99.9% 0s, so its average should be very close to zero.

I think that's still flaky... just less flaky. Eventually it will be guaranteed to fail.

ahundt · 2017-06-19T18:32:07Z

Moved to fixed seed, ran the algorithm, printed the result, and put that printed result into the code to confirm it stays close to the newly fixed expected value.

ahundt · 2017-06-19T20:14:43Z

Added a separate fix for theano/tensorflow since they have different RNGs. I think this solves the flakiness problem while still performing the desired test.

fchollet · 2017-06-20T00:41:10Z

I would propose this simpler solution, which also tests that the dropout layer was in fact applied at predict time:

np.random.seed(1234)
...
x = Input(shape=(10, 10))
...
y = wrappers.TimeDistributed(core.Dropout(.999))(x, training=True)
...
y = model.predict(np.random.random((10, 10, 10)))
assert_allclose(0., np.average(y), atol=1e-2)

ahundt · 2017-06-20T05:32:43Z

I'd expect that implementation to flake a bit too often as well if I mentally work out the probability over 1k test runs, why not 0 flakes?

The numpy array is there and constant so I can add a check for dropout too, all values should be zero or the original array value. If you're still not convinced I'll put it as you prefer. :-)

fchollet · 2017-06-20T17:19:27Z

why not 0 flakes?

Because of the seed. If it works once on each backend, then it will always work. And it will likely work once since the failure probability is very low.

…eam#7033 This may now be highlighting an actual bug.

ahundt · 2017-06-20T19:05:00Z

Okay I did that & just increased the size to over 1 million total entries to make sure it is very unlikely to ever flake, but this seems to have run into an actual bug.

================================= FAILURES ==================================
____________________ test_TimeDistributed_learning_phase ____________________
[gw0] darwin -- Python 2.7.13 /usr/local/opt/python/bin/python2.7
@keras_test
    @pytest.mark.skipif((K.backend() == 'cntk'),
                        reason='cntk does not support dropout yet')
    def test_TimeDistributed_learning_phase():
        # test layers that need learning_phase to be set
        width = 105
        height = 101
        time = 102
        x = Input(shape=(width, height))
        y = wrappers.TimeDistributed(core.Dropout(.999))(x, training=True)
        model = Model(x, y)
        y = model.predict(np.random.random((time, width, height)))
>       np.testing.assert_allclose(np.average(y), 0., atol=1e-1, rtol=1e-1)
E       AssertionError:
E       Not equal to tolerance rtol=0.1, atol=0.1
E
E       (mismatch 100.0%)
E        x: array(0.48270708322525024, dtype=float32)
E        y: array(0.0)

full issue: keras-team#7033

full issue: #7033

fchollet · 2017-08-15T18:42:54Z

Closing because: 1) PR is stale, 2) this test isn't related to a bug fix or a change in logic. Tests are meant to be a way to monitor the impact and correctness of future changes in the codebase, and in this regard I don't think this test provides useful monitoring information.

ahundt · 2017-08-17T00:21:35Z

This PR was still open because it is a suspected bug for which no fix exists, perhaps this is expected behavior or I misunderstood something?

I tried running the following again and it still fails 100% of the time, when I expected the flake rate to be around 1 in 1 million:

@keras_test
@pytest.mark.skipif((K.backend() == 'cntk'),
                    reason='cntk does not support dropout yet')
def test_TimeDistributed_learning_phase_big():
    # test layers that need learning_phase to be set
    np.random.seed(1234)
    width = 105
    height = 101
    time = 102
    x = Input(shape=(width, height))
    y = wrappers.TimeDistributed(core.Dropout(.999))(x, training=True)
    model = Model(x, y)
    y = model.predict(np.random.random((time, width, height)))
    np.testing.assert_allclose(np.mean(y), 0., atol=1e-1, rtol=1e-1)

From the run:

/usr/local/bin/py.test tests/keras/layers/wrappers_test.py
================================== test session starts ==================================
platform linux2 -- Python 2.7.12, pytest-3.2.1, py-1.4.34, pluggy-0.4.0 -- /usr/bin/python
cachedir: .cache
rootdir: /home/ahundt/src/keras, inifile: pytest.ini
collected 5 items                                                                        

tests/keras/layers/wrappers_test.py::test_TimeDistributed PASSED
tests/keras/layers/wrappers_test.py::test_TimeDistributed_learning_phase_big FAILED
tests/keras/layers/wrappers_test.py::test_TimeDistributed_learning_phase PASSED
tests/keras/layers/wrappers_test.py::test_regularizers PASSED
tests/keras/layers/wrappers_test.py::test_Bidirectional PASSED

=============================== slowest 10 test durations ===============================
5.53s call     tests/keras/layers/wrappers_test.py::test_Bidirectional
3.33s call     tests/keras/layers/wrappers_test.py::test_TimeDistributed
0.06s call     tests/keras/layers/wrappers_test.py::test_regularizers
0.04s call     tests/keras/layers/wrappers_test.py::test_TimeDistributed_learning_phase_big
0.02s call     tests/keras/layers/wrappers_test.py::test_TimeDistributed_learning_phase
0.00s setup    tests/keras/layers/wrappers_test.py::test_TimeDistributed
0.00s setup    tests/keras/layers/wrappers_test.py::test_TimeDistributed_learning_phase_big
0.00s setup    tests/keras/layers/wrappers_test.py::test_Bidirectional
0.00s teardown tests/keras/layers/wrappers_test.py::test_regularizers
0.00s setup    tests/keras/layers/wrappers_test.py::test_TimeDistributed_learning_phase
======================================= FAILURES ========================================
________________________ test_TimeDistributed_learning_phase_big ________________________

    @keras_test
    @pytest.mark.skipif((K.backend() == 'cntk'),
                        reason='cntk does not support dropout yet')
    def test_TimeDistributed_learning_phase_big():
        # test layers that need learning_phase to be set
        np.random.seed(1234)
        width = 105
        height = 101
        time = 102
        x = Input(shape=(width, height))
        y = wrappers.TimeDistributed(core.Dropout(.999))(x, training=True)
        model = Model(x, y)
        y = model.predict(np.random.random((time, width, height)))
>       np.testing.assert_allclose(np.mean(y), 0., atol=1e-1, rtol=1e-1)
E       AssertionError: 
E       Not equal to tolerance rtol=0.1, atol=0.1
E       
E       (mismatch 100.0%)
E        x: array(0.520167350769043, dtype=float32)
E        y: array(0.0)

tests/keras/layers/wrappers_test.py:126: AssertionError
--------------------------------- Captured stderr call ----------------------------------
2017-08-16 20:16:13.456284: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:02:00.0)
=================================== warnings summary ====================================
tests/keras/layers/wrappers_test.py::test_TimeDistributed
  /home/ahundt/.local/lib/python2.7/site-packages/tensorflow/python/ops/gradients_impl.py:95: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
    "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "

-- Docs: http://doc.pytest.org/en/latest/warnings.html
==================== 1 failed, 4 passed, 1 warnings in 10.50 seconds ====================
-> [1]

wrappers_test.py fix tolerance in def test_TimeDistributed_learning_p…

f8b1565

…hase() keras-team@a625fcd#diff-a60adf725df8a5eed11441fb09ae7fb0R98

ahundt mentioned this pull request Jun 19, 2017

Fix the ordering bugs when using pickle_safe=True #6891

Merged

ahundt changed the title ~~wrappers_test.py fix tolerance in def test_TimeDistributed_learning_phase()~~ wrappers_test.py test_TimeDistributed_learning_phase is flaky Jun 19, 2017

wrappers_test.py test_TimeDistributed_learning_phase try ensuring dim…

84662a6

…ensions are the same

ahundt closed this Jun 19, 2017

ahundt reopened this Jun 19, 2017

fchollet reviewed Jun 19, 2017

View reviewed changes

wrappers_test.py test_TimeDistributed_learning_phase() flakiness fix

070dfdc

wrappers_test.py fix theano

402d51e

wrappers_test.py update test_TimeDistributed_learning_phase() keras-t…

e876266

…eam#7033 This may now be highlighting an actual bug.

This was referenced Jun 20, 2017

Mask support for TimeDistributed #6401

Closed

Model() yield op Input internal fixes, plus add K.is_placeholder() #7046

Closed

ahundt added a commit to ahundt/keras that referenced this pull request Jun 21, 2017

wrappers_test.py quick fix for flaky TimeDistributed fix

a6c653b

full issue: keras-team#7033

ahundt mentioned this pull request Jun 21, 2017

wrappers_test.py quick fix for flaky TimeDistributed test #7062

Merged

ahundt added a commit to ahundt/keras that referenced this pull request Jun 21, 2017

wrappers_test.py quick fix for flaky TimeDistributed test

4e6e52a

full issue: keras-team#7033

fchollet pushed a commit that referenced this pull request Jun 21, 2017

wrappers_test.py quick fix for flaky TimeDistributed test (#7062)

b713122

full issue: #7033

Merge branch 'master' into flaky_timedistributed_test

58d56c4

ahundt changed the title ~~wrappers_test.py test_TimeDistributed_learning_phase is flaky~~ wrappers_test.py TimeDistributed may be buggy Jun 21, 2017

ahundt mentioned this pull request Jun 24, 2017

Timedistributing entire graph containing dropout #7117

Closed

This was referenced Aug 1, 2017

Fix TimeDistributed BatchNormalization #7467

Merged

Fix TimeDistributed for multi-output layers #7554

Closed

fchollet closed this Aug 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wrappers_test.py TimeDistributed may be buggy #7033

wrappers_test.py TimeDistributed may be buggy #7033

ahundt commented Jun 19, 2017 •

edited

Loading

ahundt commented Jun 19, 2017 •

edited

Loading

ahundt commented Jun 19, 2017

ahundt commented Jun 19, 2017 •

edited

Loading

fchollet Jun 19, 2017

ahundt Jun 19, 2017 •

edited

Loading

fchollet Jun 19, 2017

ahundt Jun 19, 2017

ahundt commented Jun 19, 2017 •

edited

Loading

ahundt commented Jun 19, 2017

fchollet commented Jun 20, 2017

ahundt commented Jun 20, 2017 •

edited

Loading

fchollet commented Jun 20, 2017

ahundt commented Jun 20, 2017 •

edited

Loading

fchollet commented Aug 15, 2017

ahundt commented Aug 17, 2017 •

edited

Loading

wrappers_test.py TimeDistributed may be buggy #7033

wrappers_test.py TimeDistributed may be buggy #7033

Conversation

ahundt commented Jun 19, 2017 • edited Loading

ahundt commented Jun 19, 2017 • edited Loading

ahundt commented Jun 19, 2017

ahundt commented Jun 19, 2017 • edited Loading

fchollet Jun 19, 2017

Choose a reason for hiding this comment

ahundt Jun 19, 2017 • edited Loading

Choose a reason for hiding this comment

fchollet Jun 19, 2017

Choose a reason for hiding this comment

ahundt Jun 19, 2017

Choose a reason for hiding this comment

ahundt commented Jun 19, 2017 • edited Loading

ahundt commented Jun 19, 2017

fchollet commented Jun 20, 2017

ahundt commented Jun 20, 2017 • edited Loading

fchollet commented Jun 20, 2017

ahundt commented Jun 20, 2017 • edited Loading

fchollet commented Aug 15, 2017

ahundt commented Aug 17, 2017 • edited Loading

ahundt commented Jun 19, 2017 •

edited

Loading

ahundt commented Jun 19, 2017 •

edited

Loading

ahundt commented Jun 19, 2017 •

edited

Loading

ahundt Jun 19, 2017 •

edited

Loading

ahundt commented Jun 19, 2017 •

edited

Loading

ahundt commented Jun 20, 2017 •

edited

Loading

ahundt commented Jun 20, 2017 •

edited

Loading

ahundt commented Aug 17, 2017 •

edited

Loading