Fixes #17304 Flaky Test -> test_higher_order_grad.test_tanh #17321

kshitij12345 · 2020-01-15T10:19:36Z

Going through the code for _backward_tanh, it is implemented as 1 - (tanh^2(x)), which is equivalent to sech^2(x) or (1/cosh(x))^2.

However for the failed seed, I have verified that 1 - (tanh^2(x)) is not coming equivalent to (1/ cosh(x))^2 for the randomly generated array of dim 4 (will try to investigate further for the cause).

For now replacing grad_op with its equivalent 1- tanh^2(x) looks to work without the need to relax tolerance.

codecov-io · 2020-01-15T12:53:09Z

Codecov Report

Merging #17321 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #17321   +/-   ##
=======================================
  Coverage    67.5%    67.5%           
=======================================
  Files         275      275           
  Lines       31227    31227           
  Branches     4721     4721           
=======================================
  Hits        21080    21080           
  Misses       8780     8780           
  Partials     1367     1367

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 05a0e5b...ecac9a8. Read the comment docs.

haojin2 · 2020-01-15T17:44:02Z

@kshitij12345 Have you tried to run this new version with > 500 trials?

kshitij12345 · 2020-01-15T17:49:26Z

@haojin2 , Hi can you tell me how I can do that.

haojin2 · 2020-01-15T17:54:42Z

@kshitij12345 like this:
MXNET_TEST_COUNT=500 nosetests tests/python/unittest/test_higher_order_grad.py:test_tanh

kshitij12345 · 2020-01-15T18:19:10Z

@haojin2 Thank You.

That helped, first order test failed once. Have relaxed the tolerance for first order to rtol=1e-6 and atol=1e-6.

Ran this ->
MXNET_TEST_COUNT=500 nosetests tests/python/unittest/test_higher_order_grad.py:test_tanh
multiple times after that which succeeded each time.

haojin2

LGTM, will merge after CI passes

apeforest

Looks like there is some numerical instability due to precision. Please re-run your tests multiple times as @haojin2 suggested. Otherwise, LGTM. Thanks for your prompt response.

apeforest · 2020-01-15T18:21:51Z

@kshitij12345 Could you please also do so for other operators? I suspect they might have similar issues as well. Thanks!

kshitij12345 · 2020-01-15T18:25:27Z

@apeforest @haojin2 , Yes I have seen that it is happening for arcsin and arctanh as well as #16739.

haojin2 · 2020-01-15T22:15:04Z

Merged, please continue with fixing other flaky higher-order gradient tests @kshitij12345

kshitij12345 added 3 commits January 15, 2020 15:11

use failed seed and verify first order

622248f

replace grad_op with equivalent expression

30294f1

remove fixed seed for tanh

ecac9a8

kshitij12345 mentioned this pull request Jan 15, 2020

Fix Flaky Test Higher Order Grad #17325

Merged

kshitij12345 requested review from apeforest and reminisce January 15, 2020 14:25

haojin2 added Flaky Test labels Jan 15, 2020

add relax tolerance for tanh first order

32a060d

haojin2 approved these changes Jan 15, 2020

View reviewed changes

apeforest approved these changes Jan 15, 2020

View reviewed changes

haojin2 merged commit be9e17e into apache:master Jan 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes #17304 Flaky Test -> test_higher_order_grad.test_tanh #17321

Fixes #17304 Flaky Test -> test_higher_order_grad.test_tanh #17321

kshitij12345 commented Jan 15, 2020

codecov-io commented Jan 15, 2020 •

edited

Loading

haojin2 commented Jan 15, 2020

kshitij12345 commented Jan 15, 2020

haojin2 commented Jan 15, 2020

kshitij12345 commented Jan 15, 2020

haojin2 left a comment

apeforest left a comment

apeforest commented Jan 15, 2020

kshitij12345 commented Jan 15, 2020

haojin2 commented Jan 15, 2020

Fixes #17304 Flaky Test -> test_higher_order_grad.test_tanh #17321

Fixes #17304 Flaky Test -> test_higher_order_grad.test_tanh #17321

Conversation

kshitij12345 commented Jan 15, 2020

codecov-io commented Jan 15, 2020 • edited Loading

Codecov Report

haojin2 commented Jan 15, 2020

kshitij12345 commented Jan 15, 2020

haojin2 commented Jan 15, 2020

kshitij12345 commented Jan 15, 2020

haojin2 left a comment

Choose a reason for hiding this comment

apeforest left a comment

Choose a reason for hiding this comment

apeforest commented Jan 15, 2020

kshitij12345 commented Jan 15, 2020

haojin2 commented Jan 15, 2020

codecov-io commented Jan 15, 2020 •

edited

Loading