Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

fix flaky test: test_broadcast_binary_op #11875

Merged

Conversation

azai91
Copy link
Contributor

@azai91 azai91 commented Jul 24, 2018

Description

fix flaky test test_operator.py: test_broadcast_binary_op by casting inputs to float32

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • [ ]fix flaky test test_operator.py: test_broadcast_binary_op by casting inputs to float32

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@azai91
Copy link
Contributor Author

azai91 commented Jul 24, 2018

#11838 @larroy

@eric-haibin-lin
Copy link
Member

Could you resolve conflicts?

@anirudh2290 anirudh2290 added pr-awaiting-response PR is reviewed and waiting for contributor to respond Flaky Test labels Aug 9, 2018
@larroy
Copy link
Contributor

larroy commented Aug 17, 2018

Was this broken again? I had fixed it. Sorry I have deja vu now

# doubles as well. This was a flaky test before when using float32. seed 1688524483, 1768433044
a = mx.sym.cast(a_, dtype='float64')
b = mx.sym.cast(b_, dtype='float64')
mx.sym.cast(b, dtype='float64')
Copy link
Contributor

@larroy larroy Aug 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra cast on 1919?

@azai91
Copy link
Contributor Author

azai91 commented Aug 17, 2018

test_bmod is still commented as flaky.

@azai91
Copy link
Contributor Author

azai91 commented Aug 17, 2018

you added the fix for test_binary_op, test_broadcast_binary_op has the same issue.

@larroy
Copy link
Contributor

larroy commented Aug 17, 2018

Got it, 👍🏼 can you remove the extra cast? or does it have a purpose / side effect?

@larroy
Copy link
Contributor

larroy commented Aug 17, 2018

👍🏼(lgtm)

Copy link
Contributor

@larroy larroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼

a = mx.sym.cast(a_, dtype='float64')
b = mx.sym.cast(b_, dtype='float64')
# '%' is sensitive to the precision of the calculation. Force numpy to match mxnet's float32.
#check_binary_op_forward(c, lambda a, b: np.float32(a) % np.float32(b), gen_binary_data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does having the commented check_binary_op call here help reader reason about the rest of the test? Is it needed?

@@ -1911,10 +1911,17 @@ def test_bdiv(a, b):
check_binary_op_forward(c, lambda a, b: a / b, gen_broadcast_data, mx_nd_func=mx.nd.divide)
check_binary_op_backward(c, lambda g_out, a, b: (g_out / b, - g_out * a / (b * b)), gen_broadcast_data)

def test_bmod(a, b):
def test_bmod(a_, b_):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you using a and a_ on purpose? It seems like sometimes you are using a and b, while other times you use a_ and b_

Copy link
Contributor

@KellenSunderland KellenSunderland Aug 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a_ and b_ are placeholders here before the sym.cast operator is applied to them and they become a and b. This enforces that they'll be float64s before the broadcast_mod op is applied, which would otherwise cause numerical issues. Then we compare the result of the broadcast_mod (i.e. the c variable) with the python and imperative versions (which are both passed as lambdas to the check).

Copy link
Contributor

@KellenSunderland KellenSunderland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for addressing comments.

@eric-haibin-lin eric-haibin-lin merged commit c88b8ee into apache:master Aug 26, 2018
anirudh2290 pushed a commit to anirudh2290/mxnet that referenced this pull request Sep 19, 2018
* cast inputs to f32

* retrigger

* retrigger

* remove extra cast

* remove commented out function

* retrigger
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Flaky pr-awaiting-response PR is reviewed and waiting for contributor to respond Test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants