Fix flaky test test_operator_gpu.test_spatial_transformer_with_type (… #11444

ddavydenko · 2018-06-28T03:35:25Z

…#7645)

Description

Modified test_operator_gpu.test_spatial_transformer_with_type test in order to remove its flakiness

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

removed predefined seed from the test
updated precision of comparison object from float32 to float64

Comments

I confirmed that original test code was causing test to consistently fail even with 100 iterations of running it with the error specified in original issue. After code modification I tested on 100,000 iterations and didn't notice single test failure.
PYTHONPATH=./python/ MXNET_TEST_COUNT=100000 nosetests —verbose tests/python/gpu/test_operator_gpu.py:test_spatial_transformer_with_type was used to run the test.

…pache#7645)

marcoabreu · 2018-06-28T14:30:19Z

Hi, thanks a lot for fixing this test!

I'm a bit concerned that changing the data type leads to the code taking a different path and that the issue persists with float32. Could another reviewer please help out here, please? @szha @eric-haibin-lin

szha · 2018-06-28T22:44:27Z

@marcoabreu this operator has type template in it so it's the same code with different type (and thus numeric precision)

anirudh2290 · 2018-06-29T06:51:52Z

What is the reason for failure with float32 ?

ddavydenko · 2018-06-29T16:13:35Z

Probably CO, but anyway... My guess is that there are some cumulative arithmetic operations happening where due to round up precision of the result deteriorates slowly. With increased precision of data type this loss is not able to accumulate fast enough to cause surpassing of the threshold of error when comparing results and test data.

…pache#7645) (apache#11444)

Fix flaky test test_operator_gpu.test_spatial_transformer_with_type (a…

ab1ffcc

…pache#7645)

szha merged commit f7a0025 into apache:master Jun 28, 2018

ddavydenko deleted the flaky-fix-7645 branch June 29, 2018 02:41

XinYao1994 pushed a commit to XinYao1994/incubator-mxnet that referenced this pull request Aug 29, 2018

Fix flaky test test_operator_gpu.test_spatial_transformer_with_type (a…

d975cb4

…pache#7645) (apache#11444)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flaky test test_operator_gpu.test_spatial_transformer_with_type (… #11444

Fix flaky test test_operator_gpu.test_spatial_transformer_with_type (… #11444

ddavydenko commented Jun 28, 2018

marcoabreu commented Jun 28, 2018

szha commented Jun 28, 2018

anirudh2290 commented Jun 29, 2018

ddavydenko commented Jun 29, 2018

Fix flaky test test_operator_gpu.test_spatial_transformer_with_type (… #11444

Fix flaky test test_operator_gpu.test_spatial_transformer_with_type (… #11444

Conversation

ddavydenko commented Jun 28, 2018

Description

Checklist

Essentials

Changes

Comments

marcoabreu commented Jun 28, 2018

szha commented Jun 28, 2018

anirudh2290 commented Jun 29, 2018

ddavydenko commented Jun 29, 2018