Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Register fake grad to subgraph and quantized operators #14275

Merged
merged 9 commits into from
Mar 6, 2019

Conversation

xinyu-intel
Copy link
Contributor

Description

Motivation:
Register fake grad to subgraph and quantized operators to support loading back JSON files which contain inference_only operators as symbolblock to run gluon inference.

@pengzhao-intel @TaoLv @ZhennanQin @reminisce

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@pengzhao-intel
Copy link
Contributor

MI, this is a temp solution to enable GluonCV INT8 flow and we will revert it after the improvement of CachedOP is done.

Copy link
Contributor

@reminisce reminisce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One suggestion, please add TODO(owner's name): some statement in the code as a reminder for future maintenance.

@TaoLv
Copy link
Member

TaoLv commented Feb 28, 2019

Is it possible to merge this PR into #14276 ? @xinyu-intel @ZhennanQin @pengzhao-intel

@ZhennanQin
Copy link
Contributor

@TaoLv I suggest not merge them together because this PR is a workaround while #14276 isn't. Then we can simply revert this PR when cached_op refactoring is done.

@TaoLv
Copy link
Member

TaoLv commented Feb 28, 2019

@ZhennanQin I'm afraid this PR has side effect if it's merged before #14276 . Reverting should not be a big deal as it only changes 9 lines. We always need a PR to revert changes.

@ZhennanQin
Copy link
Contributor

@TaoLv Yes, there's side-effect if this is merged before #14276. So the correct merge order is to merge #14276 first. If you think it's not a big deal on reverting, I'm fine to make them into same PR, although I don't see any benefit from it.

@TaoLv
Copy link
Member

TaoLv commented Feb 28, 2019

Avoiding side effect and keeping master branch healthy is the benefit.

@ZhennanQin
Copy link
Contributor

Merge with correct order can have same benefit:)

@ZhennanQin
Copy link
Contributor

@xinyu-intel Please collect #14276 into this PR as @TaoLv suggests.

@anirudhacharya
Copy link
Member

@mxnet-label-bot add [pr-awaiting-review]

@marcoabreu marcoabreu added the pr-awaiting-review PR is waiting for code review label Mar 1, 2019
@pengzhao-intel
Copy link
Contributor

@TaoLv please help review again :)

Copy link
Member

@TaoLv TaoLv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just minor comment.

@@ -107,7 +107,11 @@ class SgMKLDNNConvPostQuantizeProperty : public SubgraphProperty {
}
}
static SubgraphPropertyPtr Create() {
return std::make_shared<SgMKLDNNConvPostQuantizeProperty>();
auto property = std::make_shared<SgMKLDNNConvPostQuantizeProperty>();
property->SetAttr<std::string>("prop_name",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can we use "property_name" or just "name" here? Because "prop" also stands for "propagation" somewhere.

@TaoLv
Copy link
Member

TaoLv commented Mar 3, 2019

@reminisce Can you take a look at the changes for subgraph?

Copy link
Contributor

@pengzhao-intel pengzhao-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@reminisce please help confirm the subgraph related changes :)

@@ -82,6 +82,9 @@ where
.set_attr<mxnet::FInferShape>("FInferShape", QuantizeShape)
.set_attr<nnvm::FInferType>("FInferType", QuantizeType)
.set_attr<FInferStorageType>("FInferStorageType", QuantizeStorageType)
// TODO(Xinyu): a temp solution to enable GluonCV INT8 flow,
// will be reverted after the improvement of CachedOP is done.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this currently WIP ? can you open an issue for this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#14331 added:)

LOG(INFO) << "Skip subgraph " << full_name << " as it requires `grad_req=null`.";
return src;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add a test for this case: module bound with for_training=True, and the symbol reused to invoke graph partitioning with MKLDNN. The returned symbol shouldn't contain mkldnn subgraph ops.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is, we can't get the symbol after module bound. So we can't check if MKLDNN graph partitioning happens or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we use mod._sym and call get_backend_symbol("MKLDNN") on it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there's no python API can return the partitioned symbol after bind, even mod._sym holds the original symbol. And get_backend_symbol("MKLDNN") will apply partition and won't skip inference_only pass. inference_only only works for bind.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense.

@reminisce reminisce merged commit b486594 into apache:master Mar 6, 2019
vdantu pushed a commit to vdantu/incubator-mxnet that referenced this pull request Mar 31, 2019
* add fake grad

* Skip inference only subgraph pass when gradient is needed.

* add fake grad to quantizev2

* add TODO

* modify prop_name to property_name

* add test case
nswamy pushed a commit that referenced this pull request Apr 5, 2019
* add fake grad

* Skip inference only subgraph pass when gradient is needed.

* add fake grad to quantizev2

* add TODO

* modify prop_name to property_name

* add test case
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
* add fake grad

* Skip inference only subgraph pass when gradient is needed.

* add fake grad to quantizev2

* add TODO

* modify prop_name to property_name

* add test case
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-review PR is waiting for code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants