Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Improve static cached_op optimization #15187

Merged
merged 3 commits into from
Jun 13, 2019
Merged

Conversation

xinyu-intel
Copy link
Contributor

@xinyu-intel xinyu-intel commented Jun 8, 2019

fix along with #14931

@pengzhao-intel @ZhennanQin

Description

(Brief description on what this PR is about)

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

Change-Id: If90c6f0997548ffd5daa67cc18bab7405f24213b
@@ -595,7 +595,10 @@ inline bool CheckAndInferShape(nnvm::Graph* p_g, mxnet::ShapeVector&& shapes,
*contain_unknown = false;
}
nnvm::Graph& g = *p_g;
if (g.attrs.count("shape")) {
if (use_inputs) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the comments in the code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is removed from previous commit by mistake. This is just to recover them back.

@piyushghai
Copy link
Contributor

Thanks for your contributions @xinyu-intel .
Can you also look into the CI failures ?

@mxnet-label-bot Add [pr-awaiting-review, Operator, Backend]

@marcoabreu marcoabreu added Backend Issues related to the backend of MXNet Operator pr-awaiting-review PR is waiting for code review labels Jun 9, 2019
@pengzhao-intel
Copy link
Contributor

Please also try dmlc/gluon-nlp#690

@ZhennanQin
Copy link
Contributor

@pengzhao-intel I've tested dmlc/gluon-nlp#690, and everything works fine.

@pengzhao-intel
Copy link
Contributor

pengzhao-intel commented Jun 11, 2019

@pengzhao-intel I've tested dmlc/gluon-nlp#690, and everything works fine.

Thanks :)
Any possible to create tests? I see the mistake operation in previous CI can't be caught.
Do you think the new feature of benchmark OP (#14977) can help for the performance test?

@@ -290,7 +290,7 @@ bool CachedOp::CheckDynamicShapeExists(const Context& default_ctx,
CheckAndInferShape(&g, std::move(shape_inputs), true,
{0, 0}, {0, 0},
&contain_dynamic_shape);
if (contain_dynamic_shape && erase_result) {
if (!config_.static_shape && erase_result) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like something wasn't properly tested here? Maybe add a test to cover this path?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already covered by current UT, there're 2 cases failed before fixing this.

@roywei
Copy link
Member

roywei commented Jun 12, 2019

@PatricZhao @ZhennanQin Is this good to go?
We recently observed static alloc regression on some sue cases. One of the case is reported here #15067.
another is here: https://lists.apache.org/thread.html/7263d1be69d7e6f07b81c28fc1d209f30bdd3d36ea860ebbffddf96d@%3Cdev.mxnet.apache.org%3E

Although we are not sure if this fix is related, can we add this to 1.5.0 anyway since #14931 is already in 1.5.0.rc0 ?
The vote is cancelled, and pending fix for horovod build. So we have some time to merge this.
Thanks!

@pengzhao-intel
Copy link
Contributor

@PatricZhao @ZhennanQin Is this good to go?
We recently observed static alloc regression on some sue cases. One of the case is reported here #15067.
another is here: https://lists.apache.org/thread.html/7263d1be69d7e6f07b81c28fc1d209f30bdd3d36ea860ebbffddf96d@%3Cdev.mxnet.apache.org%3E

Although we are not sure if this fix is related, can we add this to 1.5.0 anyway since #14931 is already in 1.5.0.rc0 ?
The vote is cancelled, and pending fix for horovod build. So we have some time to merge this.
Thanks!

Agree! I think the PR is ready to merge.

Copy link
Contributor

@pengzhao-intel pengzhao-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pengzhao-intel pengzhao-intel merged commit 09202f7 into apache:master Jun 13, 2019
@pengzhao-intel
Copy link
Contributor

@roywei merged, please take it into 1.5 and try if the regression issue is fixed.
Feel free to ping us if anything needs our help.

roywei pushed a commit to roywei/incubator-mxnet that referenced this pull request Jun 13, 2019
* Fix cached op

Change-Id: If90c6f0997548ffd5daa67cc18bab7405f24213b

* Fix UT

* trigger
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
* Fix cached op

Change-Id: If90c6f0997548ffd5daa67cc18bab7405f24213b

* Fix UT

* trigger
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Backend Issues related to the backend of MXNet Operator pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants