Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[BUGFIX] Fix flaky test #19197 by avoiding case that 0.45 mapped to 0.5 #19201

Merged
merged 1 commit into from
Sep 21, 2020

Conversation

kpuatamazon
Copy link
Contributor

Description

This is testing a quantizer by comparing with other MXNet operators like round. There's one ambiguous case, which is what to do with 0.5, 1.5, -0.5, etc. Implementations can disagree (plus floating point rounding might push things to the side a bit) and that's ok. So to make things consistent, I pushed random values between (0.45 and 0.55) away from 0.5 by adding. . . 0.05. That was dumb because it just mapped values in (0.45, 0.55) to (0.5, 0.6) which still had the same rounding issue and triggered a flaky test #19197. This PR remaps (0.45, 0.55) to (0.65, 0.75) ensuring no value is near 0.5 when it is rounded. Then quantized values can be compared exactly.

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Changes

  • Fix flaky test

Comments

There will be another PR for master.

@mxnet-bot
Copy link

Hey @kpuatamazon , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [clang, website, unix-cpu, sanity, unix-gpu, windows-cpu, edge, centos-gpu, miscellaneous, windows-gpu, centos-cpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@kpuatamazon
Copy link
Contributor Author

@mxnet-bot run ci [unix-gpu]

Not my errors, appears to be GPU test failures https://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-gpu/branches/PR-19201/runs/1/nodes/354/steps/721/log/?start=0

[2020-09-21T12:11:00.603Z] -- Set TVM_LLVM_VERSION=
[2020-09-21T12:11:00.603Z] -- Configuring incomplete, errors occurred!
[2020-09-21T12:11:00.603Z] See also "/tmp/tvm/build/CMakeFiles/CMakeOutput.log".
[2020-09-21T12:11:00.603Z] See also "/tmp/tvm/build/CMakeFiles/CMakeError.log".
[2020-09-21T12:11:00.603Z] Makefile:20: recipe for target 'all' failed
[2020-09-21T12:11:00.603Z] �[91mmake: *** [all] Error 1
[2020-09-21T12:11:00.603Z] �[0m�[91mTraceback (most recent call last):
[2020-09-21T12:11:00.603Z]   File "setup.py", line 10, in <module>
[2020-09-21T12:11:00.603Z]     from setuptools import find_packages
[2020-09-21T12:11:00.603Z] ImportError: No module named setuptools
[2020-09-21T12:11:00.603Z] �[0m/tmp/tvm
[2020-09-21T12:11:00.603Z] �[91mTraceback (most recent call last):
[2020-09-21T12:11:00.603Z]   File "setup.py", line 7, in <module>
[2020-09-21T12:11:00.603Z]     from setuptools import find_packages
[2020-09-21T12:11:00.603Z] ImportError: No module named setuptools
[2020-09-21T12:11:00.603Z] �[0m/tmp/tvm
[2020-09-21T12:11:01.959Z] Removing intermediate container 0fab8caf9769
[2020-09-21T12:11:01.959Z]  ---> 71c43069a919
[2020-09-21T12:17:50.100Z] test_gluon_gpu.test_parameter_invalid_access ... ok (0.0028s)
[2020-09-21T12:17:50.100Z] test_gluon_gpu.test_batchnorm_16c ... [WARNING] Error seen with seeded test, use MXNET_TEST_SEED=1604605631 to reproduce.
[2020-09-21T12:17:50.100Z] Error seen with seeded test, use MXNET_TEST_SEED=1604605631 to reproduce.
[2020-09-21T12:17:50.100Z] SKIP: skippping temporarily, tracked by https://github.com/apache/incubator-mxnet/issues/11164
[2020-09-21T12:17:50.100Z] test_gluon_gpu.test_bulking_gluon_gpu ... [WARNING] Error seen with seeded test, use MXNET_TEST_SEED=139415649 to reproduce.
[2020-09-21T12:17:50.100Z] Error seen with seeded test, use MXNET_TEST_SEED=139415649 to reproduce.
[2020-09-21T12:17:50.100Z] SKIP: skippping temporarily, tracked by https://github.com/apache/incubator-mxnet/issues/14970
[2020-09-21T12:17:50.100Z] test_gluon_gpu.test_deconv2d_16c ... [WARNING] Error seen with seeded test, use MXNET_TEST_SEED=1066940734 to reproduce.
[2020-09-21T12:17:50.100Z] Error seen with seeded test, use MXNET_TEST_SEED=1066940734 to reproduce.
[2020-09-21T12:17:50.100Z] SKIP: skippping temporarily, tracked by https://github.com/apache/incubator-mxnet/issues/11164
[2020-09-21T12:17:50.100Z] test_gluon_gpu.test_reshape_batchnorm ... [WARNING] Error seen with seeded test, use MXNET_TEST_SEED=66963173 to reproduce.
[2020-09-21T12:17:50.100Z] Error seen with seeded test, use MXNET_TEST_SEED=66963173 to reproduce.
[2020-09-21T12:17:50.100Z] SKIP: skippping temporarily, tracked by https://github.com/apache/incubator-mxnet/issues/11164
[2020-09-21T12:17:50.100Z] test_gluon_gpu.test_reshape_batchnorm_reshape_batchnorm ... [WARNING] Error seen with seeded test, use MXNET_TEST_SEED=224810124 to reproduce.
[2020-09-21T12:17:50.100Z] Error seen with seeded test, use MXNET_TEST_SEED=224810124 to reproduce.
[2020-09-21T12:17:50.100Z] SKIP: skippping temporarily, tracked by https://github.com/apache/incubator-mxnet/issues/11164
[2020-09-21T12:17:50.100Z] test_gluon_gpu.test_reshape_batchnorm_slice_batchnorm ... [WARNING] Error seen with seeded test, use MXNET_TEST_SEED=1568844484 to reproduce.
[2020-09-21T12:17:50.100Z] Error seen with seeded test, use MXNET_TEST_SEED=1568844484 to reproduce.
[2020-09-21T12:17:50.100Z] SKIP: skippping temporarily, tracked by https://github.com/apache/incubator-mxnet/issues/11164
[2020-09-21T12:17:50.100Z] test_gluon_gpu.test_reshape_conv_reshape_conv ... [WARNING] Error seen with seeded test, use MXNET_TEST_SEED=221214508 to reproduce.
[2020-09-21T12:17:50.100Z] Error seen with seeded test, use MXNET_TEST_SEED=221214508 to reproduce.
[2020-09-21T12:17:50.100Z] SKIP: skippping temporarily, tracked by https://github.com/apache/incubator-mxnet/issues/11164
[2020-09-21T12:17:50.100Z] test_gluon_gpu.test_reshape_deconv ... [WARNING] Error seen with seeded test, use MXNET_TEST_SEED=1323171665 to reproduce.
[2020-09-21T12:17:50.100Z] Error seen with seeded test, use MXNET_TEST_SEED=1323171665 to reproduce.
[2020-09-21T12:17:50.100Z] SKIP: skippping temporarily, tracked by https://github.com/apache/incubator-mxnet/issues/11164
[2020-09-21T12:17:50.100Z] test_gluon_gpu.test_reshape_deconv_reshape_deconv ... [WARNING] Error seen with seeded test, use MXNET_TEST_SEED=99759348 to reproduce.
[2020-09-21T12:17:50.100Z] Error seen with seeded test, use MXNET_TEST_SEED=99759348 to reproduce.
[2020-09-21T12:17:50.100Z] SKIP: skippping temporarily, tracked by https://github.com/apache/incubator-mxnet/issues/11164
[2020-09-21T12:17:50.100Z] test_gluon_gpu.test_reshape_deconv_slice_deconv ... [WARNING] Error seen with seeded test, use MXNET_TEST_SEED=800581065 to reproduce.
[2020-09-21T12:17:50.100Z] Error seen with seeded test, use MXNET_TEST_SEED=800581065 to reproduce.
[2020-09-21T12:17:50.100Z] SKIP: skippping temporarily, tracked by https://github.com/apache/incubator-mxnet/issues/11164
[2020-09-21T12:17:50.100Z] test_gluon_gpu.test_reshape_pooling2d ... [WARNING] Error seen with seeded test, use MXNET_TEST_SEED=910275680 to reproduce.
[2020-09-21T12:17:50.100Z] Error seen with seeded test, use MXNET_TEST_SEED=910275680 to reproduce.
[2020-09-21T12:17:50.100Z] SKIP: skippping temporarily, tracked by https://github.com/apache/incubator-mxnet/issues/11164
[2020-09-21T12:17:50.100Z] test_gluon_gpu.test_reshape_pooling2d_reshape_pooling2d ... [WARNING] Error seen with seeded test, use MXNET_TEST_SEED=1169201795 to reproduce.
[2020-09-21T12:17:50.100Z] Error seen with seeded test, use MXNET_TEST_SEED=1169201795 to reproduce.
[2020-09-21T12:17:50.100Z] SKIP: skippping temporarily, tracked by https://github.com/apache/incubator-mxnet/issues/11164
[2020-09-21T12:17:50.100Z] test_gluon_gpu.test_reshape_pooling2d_slice_pooling2d ... [WARNING] Error seen with seeded test, use MXNET_TEST_SEED=1444564365 to reproduce.
[2020-09-21T12:17:50.100Z] Error seen with seeded test, use MXNET_TEST_SEED=1444564365 to reproduce.
[2020-09-21T12:17:50.100Z] SKIP: skippping temporarily, tracked by https://github.com/apache/incubator-mxnet/issues/11164
[2020-09-21T12:17:50.100Z] test_gluon_gpu.test_slice_batchnorm_slice_batchnorm ... [WARNING] Error seen with seeded test, use MXNET_TEST_SEED=1784280898 to reproduce.
[2020-09-21T12:17:50.100Z] Error seen with seeded test, use MXNET_TEST_SEED=1784280898 to reproduce.
[2020-09-21T12:17:50.100Z] SKIP: skippping temporarily, tracked by https://github.com/apache/incubator-mxnet/issues/11164
[2020-09-21T12:17:50.100Z] test_gluon_gpu.test_slice_conv_reshape_conv ... [WARNING] Error seen with seeded test, use MXNET_TEST_SEED=1835373627 to reproduce.
[2020-09-21T12:17:50.100Z] Error seen with seeded test, use MXNET_TEST_SEED=1835373627 to reproduce.
[2020-09-21T12:17:50.100Z] SKIP: skippping temporarily, tracked by https://github.com/apache/incubator-mxnet/issues/11164
[2020-09-21T12:17:50.100Z] test_gluon_gpu.test_slice_deconv ... [WARNING] Error seen with seeded test, use MXNET_TEST_SEED=80417768 to reproduce.
[2020-09-21T12:17:50.100Z] Error seen with seeded test, use MXNET_TEST_SEED=80417768 to reproduce.
[2020-09-21T12:17:50.100Z] SKIP: skippping temporarily, tracked by https://github.com/apache/incubator-mxnet/issues/11164
[2020-09-21T12:17:50.100Z] test_gluon_gpu.test_slice_deconv_reshape_deconv ... [WARNING] Error seen with seeded test, use MXNET_TEST_SEED=1099418729 to reproduce.
[2020-09-21T12:17:50.100Z] Error seen with seeded test, use MXNET_TEST_SEED=1099418729 to reproduce.
[2020-09-21T12:17:50.100Z] SKIP: skippping temporarily, tracked by https://github.com/apache/incubator-mxnet/issues/11164
[2020-09-21T12:17:50.100Z] test_gluon_gpu.test_slice_deconv_slice_deconv ... [WARNING] Error seen with seeded test, use MXNET_TEST_SEED=689481428 to reproduce.
[2020-09-21T12:17:50.100Z] Error seen with seeded test, use MXNET_TEST_SEED=689481428 to reproduce.
[2020-09-21T12:17:50.100Z] SKIP: skippping temporarily, tracked by https://github.com/apache/incubator-mxnet/issues/11164
[2020-09-21T12:17:50.100Z] test_gluon_gpu.test_slice_pooling2d_reshape_pooling2d ... [WARNING] Error seen with seeded test, use MXNET_TEST_SEED=55754630 to reproduce.
[2020-09-21T12:17:50.100Z] Error seen with seeded test, use MXNET_TEST_SEED=55754630 to reproduce.

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-gpu]

@kpuatamazon
Copy link
Contributor Author

kpuatamazon commented Sep 21, 2020

@mxnet-bot run ci [unix-cpu]

And unix-cpu is failing in the R test due to #19169:

[2020-09-21T11:58:11.162Z] --------------------------------------------------------------------------------
[2020-09-21T11:58:11.162Z] test_img_seg.R:157: error: UNET
[2020-09-21T11:58:11.162Z] could not find function "mx.ctx.default"
[2020-09-21T11:58:11.162Z] --------------------------------------------------------------------------------
[2020-09-21T11:58:15.333Z] 
/ |   0       | initializer

https://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-cpu/branches/PR-19201/runs/1/nodes/303/steps/764/log/?start=0

@mxnet-bot
Copy link

Undefined action detected.
Permissible actions are : run ci [all], run ci [job1, job2]
Example : @mxnet-bot run ci [all]
Example : @mxnet-bot run ci [centos-cpu, clang]

@kpuatamazon kpuatamazon mentioned this pull request Sep 21, 2020
@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu]

1 similar comment
@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu]

@kpuatamazon
Copy link
Contributor Author

@mxnet-bot run ci [unix-cpu]

Fun fact: the bot is not idempotent and editing a message will cause it to try again, interrupt the existing run, and report failure because it interrupted the previous run.

@mxnet-bot
Copy link

Jenkins CI successfully triggered : [unix-cpu]

Copy link
Contributor

@samskalicky samskalicky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

@samskalicky samskalicky merged commit 4e46b51 into apache:v1.x Sep 21, 2020
@samskalicky
Copy link
Contributor

@kpuatamazon does this need to be committed to master branch too?

@kpuatamazon
Copy link
Contributor Author

@kpuatamazon does this need to be committed to master branch too?

#19202

@kpuatamazon kpuatamazon deleted the flakytest-v1.x branch September 28, 2020 09:13
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants