Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[WIP] Fix the unit test for depthwise convolution #14452

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

TaoLv
Copy link
Member

@TaoLv TaoLv commented Mar 17, 2019

Description

Should fix #14052 .

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@karan6181
Copy link
Contributor

@mxnet-label-bot add [Test, Python, Flaky, pr-work-in-progress]

@TaoLv
Copy link
Member Author

TaoLv commented Mar 20, 2019

Looks like the issue is still there and flaky. New MKL-DNN doesn't mitigate the problem. @pengzhao-intel @szha

@ZhennanQin
Copy link
Contributor

Looked into the CI result, test_depthwise_convolution can pass now. The failures seem don't relate to test_depthwise_convolution. Please try to retrigger CI.

@TaoLv
Copy link
Member Author

TaoLv commented Mar 20, 2019

CI was triggered 3 times in this PR and test_depthwise_convolution failed twice in them:
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fwindows-gpu/detail/PR-14452/1/pipeline#step-178-log-2982
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fwindows-gpu/detail/PR-14452/3/pipeline/#step-156-log-2968

As the the second failure happened on python3: GPU Win, I suspect that the issue was not caused by MKL-DNN. Besides seems it's not the same issue descried in #14052 @mseth10 .

@mseth10
Copy link
Contributor

mseth10 commented Mar 20, 2019

This seems like a new failure and needs to be investigated, it was not there when I raised the issue #14052 .
Are we done with MKLDNN fix? What about the other CI failures?

@pengzhao-intel
Copy link
Contributor

Maybe @ptrendx can help too :)

@piyushghai
Copy link
Contributor

@TaoLv Were you able to figure out the test failures here ?

@karan6181
Copy link
Contributor

@TaoLv Could you please provide an update on this PR? It's being idle for sometime now. Thanks!

@pengzhao-intel
Copy link
Contributor

@TaoLv any plan on this? Still on 0.20 or 1.0?

@roywei
Copy link
Member

roywei commented Aug 19, 2019

I see the issue is closed, please close if not required

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

flaky test: test_operator.test_depthwise_convolution
9 participants