Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Graph partitioner and subgraph op #11251

Merged
merged 37 commits into from
Jun 21, 2018
Merged

Conversation

reminisce
Copy link
Contributor

Description

(Brief description on what this PR is about)

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@piiswrong @zheng-da

@reminisce reminisce requested a review from szha as a code owner June 13, 2018 00:55
@reminisce reminisce changed the title [WIP] Subgraph op [WIP] Graph partitioner and subgraph op Jun 13, 2018
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you write a simple README for this file? How to run this test? How can users play with this test?

Copy link
Contributor Author

@reminisce reminisce Jun 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use the following command to try it. Please remember to change ctx passed to the function score into mx.cpu() if you are running on a cpu. It's just a preliminary end-to-end run-through for proof of concept. More changes are coming.

python --model=imagenet1k-resnet-152 --dataset=data --num-inference-batches=10

sym, arg_params, aux_params = mx.model.load_checkpoint(prefix, epoch)
op_names = ['BatchNorm', 'Convolution', 'Pooling', 'Activation']
out = SymbolHandle()
check_call(_LIB.MXPartitionGraph(sym.handle, mx_uint(len(op_names)), c_str_array(op_names),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about wrapping this function into Symbol class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will hide this from users. The API design is still under discussion.

kCrossDeviceCopy,
/*!
* A subgraph execution should happen in the main thread, instead of
* in the execution engine.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

operators inside a subgraph are still executed in the execution engine, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@@ -33,8 +33,12 @@
namespace mxnet {
namespace engine {

#if 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for debug?


namespace mxnet {
namespace engine {

class NaiveVar final
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a document for this class.

@@ -71,8 +81,11 @@ class NaiveEngine final : public Engine {

// new variables
VarHandle NewVariable() override {
return NaiveVar::New();
#if 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for debug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.

@pengzhao-intel
Copy link
Contributor

Great, this will be very helpful for us 👍
Thanks, @reminisce @zheng-da

if (!op_name_set.empty()) {
mxnet::op::SubgraphPropertyPtr property
= std::make_shared<mxnet::op::DefaultSubgraphProperty>(op_name_set);
g.attrs["subgraph_property"] = std::make_shared<nnvm::any>(std::move(property));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for mxnet to enable several acceleration backends at the same time? So then there will be several different subgraph_property for different backends respectively?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not exactly. Your implementation of SubgraphProperty determines what subgraph op node is used and consequently, what Forward/Backward implementation is adopted. If you want to support multiple backend at the same time, you just need to configure that in the implementation of Forward/Backward. The question is, how do you determine when to use which backend?


// filter out unqualified pre-selected nodes
std::vector<nnvm::Node*> filtered_nodes = preselected_nodes;
subgraph_selector->Filter(g, &filtered_nodes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have Filter to return a vector of Nodes that are selected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, it's easier for users to implement this function. I will make the change.

}
// find out subgraphs from the filtered nodes
std::vector<std::vector<SimpleNode*>> subgraphs;
PostProcessNodeCandidates(*g, filtered_nodes, simple_nodes, &subgraphs, &subgraph_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we require the nodes returned from Filter belong to a single subgraph?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this part. If users only care about the network pattern matching, requiring it to return a single subgraph might be too strict.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean finding a single subgraph from each node. The algorithm starts to search for subgraphs from every node. It should be sufficient to find at most one subgraph from every node.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. That seems clearer. I will change the logic of getting multiple subgraphs from the filtered nodes into getting a single subgraph starting from the seed node passed to PreSelectSubgraphNodes.

reminisce and others added 23 commits June 20, 2018 16:28
* add functions for cutting edges.

* construct subgraphs.

* generalize graph partition.

* restructure the code.

* create SubgraphOpState.

* register subgraph property.

* rename.

* address comments.

* update select API.

* rename.

* set subgraph property.

* fix bugs.

* fix bugs.
* remove SubgraphOperator.

* change the interface of SubgraphSelector.
* update tests.

* fix shape/dtype/storage inference.

* fix.
* execute subgraph operators correctly.

* add HasSubgraph attr.

* support subgraph op in imperative.

* rewrite running subgraph in main thread.

* execute subgraph op in imperative.

* remove need_grad.
exec::DevMaskVector dev_masks(idx_g.num_node_entries(), dev_mask);

// Put the input and output storages to the storage vector.
nnvm::StorageVector stypes(idx_g.num_node_entries(), exec::kBadStorageID);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nnvm::StorageVector is used for plan_memory, not for storage type inference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Fixed.

@reminisce reminisce changed the title [WIP] Graph partitioner and subgraph op Graph partitioner and subgraph op Jun 21, 2018
@reminisce reminisce merged commit 9567bd7 into apache:subgraph Jun 21, 2018
reminisce added a commit that referenced this pull request Aug 4, 2018
Graph partitioner and subgraph op
ashokei pushed a commit to NervanaSystems/incubator-mxnet that referenced this pull request Aug 4, 2018
Graph partitioner and subgraph op
reminisce added a commit to reminisce/mxnet that referenced this pull request Aug 14, 2018
Graph partitioner and subgraph op

Fix duplicate entry bugs (apache#11767)

Make subgraph var node name unique (apache#11876)

[DO NOT REVIEW] Fix bug of eliminating cycles (apache#11907)

* Fix cycle bug

* Fix decycle bug

* Fix comment

[DO NOT REVIEW] Subgraph API (apache#12104)

* Initial commit

* Add unit tests

* Fix lint

* Fix lint

* Clean up

* Add graph partitiong to Bind

* Add property name to graph partitioning c api

* Fix unit test gpu context

* Address cr

* Move subgraph to attrs.subgraphs and fix the example

* Fix lint

* Add var version unit test

* Address cr

* Enable unit test that was flaky
reminisce added a commit to reminisce/mxnet that referenced this pull request Aug 21, 2018
Graph partitioner and subgraph op

Fix duplicate entry bugs (apache#11767)

Make subgraph var node name unique (apache#11876)

[DO NOT REVIEW] Fix bug of eliminating cycles (apache#11907)

* Fix cycle bug

* Fix decycle bug

* Fix comment

[DO NOT REVIEW] Subgraph API (apache#12104)

* Initial commit

* Add unit tests

* Fix lint

* Fix lint

* Clean up

* Add graph partitiong to Bind

* Add property name to graph partitioning c api

* Fix unit test gpu context

* Address cr

* Move subgraph to attrs.subgraphs and fix the example

* Fix lint

* Add var version unit test

* Address cr

* Enable unit test that was flaky
reminisce added a commit to reminisce/mxnet that referenced this pull request Aug 23, 2018
Graph partitioner and subgraph op

Fix duplicate entry bugs (apache#11767)

Make subgraph var node name unique (apache#11876)

[DO NOT REVIEW] Fix bug of eliminating cycles (apache#11907)

* Fix cycle bug

* Fix decycle bug

* Fix comment

[DO NOT REVIEW] Subgraph API (apache#12104)

* Initial commit

* Add unit tests

* Fix lint

* Fix lint

* Clean up

* Add graph partitiong to Bind

* Add property name to graph partitioning c api

* Fix unit test gpu context

* Address cr

* Move subgraph to attrs.subgraphs and fix the example

* Fix lint

* Add var version unit test

* Address cr

* Enable unit test that was flaky
reminisce added a commit to reminisce/mxnet that referenced this pull request Aug 24, 2018
Graph partitioner and subgraph op

Fix duplicate entry bugs (apache#11767)

Make subgraph var node name unique (apache#11876)

[DO NOT REVIEW] Fix bug of eliminating cycles (apache#11907)

* Fix cycle bug

* Fix decycle bug

* Fix comment

[DO NOT REVIEW] Subgraph API (apache#12104)

* Initial commit

* Add unit tests

* Fix lint

* Fix lint

* Clean up

* Add graph partitiong to Bind

* Add property name to graph partitioning c api

* Fix unit test gpu context

* Address cr

* Move subgraph to attrs.subgraphs and fix the example

* Fix lint

* Add var version unit test

* Address cr

* Enable unit test that was flaky
reminisce added a commit to reminisce/mxnet that referenced this pull request Aug 28, 2018
Graph partitioner and subgraph op

Fix duplicate entry bugs (apache#11767)

Make subgraph var node name unique (apache#11876)

[DO NOT REVIEW] Fix bug of eliminating cycles (apache#11907)

* Fix cycle bug

* Fix decycle bug

* Fix comment

[DO NOT REVIEW] Subgraph API (apache#12104)

* Initial commit

* Add unit tests

* Fix lint

* Fix lint

* Clean up

* Add graph partitiong to Bind

* Add property name to graph partitioning c api

* Fix unit test gpu context

* Address cr

* Move subgraph to attrs.subgraphs and fix the example

* Fix lint

* Add var version unit test

* Address cr

* Enable unit test that was flaky
reminisce added a commit to reminisce/mxnet that referenced this pull request Aug 29, 2018
Graph partitioner and subgraph op

Fix duplicate entry bugs (apache#11767)

Make subgraph var node name unique (apache#11876)

[DO NOT REVIEW] Fix bug of eliminating cycles (apache#11907)

* Fix cycle bug

* Fix decycle bug

* Fix comment

[DO NOT REVIEW] Subgraph API (apache#12104)

* Initial commit

* Add unit tests

* Fix lint

* Fix lint

* Clean up

* Add graph partitiong to Bind

* Add property name to graph partitioning c api

* Fix unit test gpu context

* Address cr

* Move subgraph to attrs.subgraphs and fix the example

* Fix lint

* Add var version unit test

* Address cr

* Enable unit test that was flaky
eric-haibin-lin pushed a commit that referenced this pull request Aug 31, 2018
* Graph partitioner and subgraph op (#11251)

Graph partitioner and subgraph op

Fix duplicate entry bugs (#11767)

Make subgraph var node name unique (#11876)

[DO NOT REVIEW] Fix bug of eliminating cycles (#11907)

* Fix cycle bug

* Fix decycle bug

* Fix comment

[DO NOT REVIEW] Subgraph API (#12104)

* Initial commit

* Add unit tests

* Fix lint

* Fix lint

* Clean up

* Add graph partitiong to Bind

* Add property name to graph partitioning c api

* Fix unit test gpu context

* Address cr

* Move subgraph to attrs.subgraphs and fix the example

* Fix lint

* Add var version unit test

* Address cr

* Enable unit test that was flaky

* Clean up

* Clean up

* Clean up

* Change version return type in NDArray

* Clean up

* Add register or get for subgraph prop registry

* Address cr

* Remove unnecessary code

* Handle var version issue in naive engine

* Delete example

* Remove registration of resource request for default subgraph op

* Add doc string

* Improve doc string
aaronmarkham pushed a commit to aaronmarkham/incubator-mxnet that referenced this pull request Sep 11, 2018
* Graph partitioner and subgraph op (apache#11251)

Graph partitioner and subgraph op

Fix duplicate entry bugs (apache#11767)

Make subgraph var node name unique (apache#11876)

[DO NOT REVIEW] Fix bug of eliminating cycles (apache#11907)

* Fix cycle bug

* Fix decycle bug

* Fix comment

[DO NOT REVIEW] Subgraph API (apache#12104)

* Initial commit

* Add unit tests

* Fix lint

* Fix lint

* Clean up

* Add graph partitiong to Bind

* Add property name to graph partitioning c api

* Fix unit test gpu context

* Address cr

* Move subgraph to attrs.subgraphs and fix the example

* Fix lint

* Add var version unit test

* Address cr

* Enable unit test that was flaky

* Clean up

* Clean up

* Clean up

* Change version return type in NDArray

* Clean up

* Add register or get for subgraph prop registry

* Address cr

* Remove unnecessary code

* Handle var version issue in naive engine

* Delete example

* Remove registration of resource request for default subgraph op

* Add doc string

* Improve doc string
anirudh2290 pushed a commit to anirudh2290/mxnet that referenced this pull request Sep 19, 2018
* Graph partitioner and subgraph op (apache#11251)

Graph partitioner and subgraph op

Fix duplicate entry bugs (apache#11767)

Make subgraph var node name unique (apache#11876)

[DO NOT REVIEW] Fix bug of eliminating cycles (apache#11907)

* Fix cycle bug

* Fix decycle bug

* Fix comment

[DO NOT REVIEW] Subgraph API (apache#12104)

* Initial commit

* Add unit tests

* Fix lint

* Fix lint

* Clean up

* Add graph partitiong to Bind

* Add property name to graph partitioning c api

* Fix unit test gpu context

* Address cr

* Move subgraph to attrs.subgraphs and fix the example

* Fix lint

* Add var version unit test

* Address cr

* Enable unit test that was flaky

* Clean up

* Clean up

* Clean up

* Change version return type in NDArray

* Clean up

* Add register or get for subgraph prop registry

* Address cr

* Remove unnecessary code

* Handle var version issue in naive engine

* Delete example

* Remove registration of resource request for default subgraph op

* Add doc string

* Improve doc string
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants