Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[NGRAPH] MXNet - nGraph initial integration #12502

Closed

Conversation

mbrookhart
Copy link

@mbrookhart mbrookhart commented Sep 10, 2018

Description

Integrate MXNet with the Intel nGraph compiler using the Subgraph API feature. This PR provides integration for a number of models on the CPU. More models and backends will be coming in future PRs.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant JIRA issue created (except PRs with tiny changes)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • nGraph Integration through the Subgraph API
  • Better performance on a number of CPU models

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

@ashokei ashokei force-pushed the ngraph_initial_upstream branch 2 times, most recently from 0da8f18 to 6aca854 Compare September 11, 2018 05:14
@eric-haibin-lin
Copy link
Member

@zheng-da

@kalyc
Copy link
Contributor

kalyc commented Sep 14, 2018

Thanks for your contribution @mbrookhart
@mxnet-label-bot[pr-awaiting-review]

@marcoabreu marcoabreu added the pr-awaiting-review PR is waiting for code review label Sep 14, 2018
@eric-haibin-lin
Copy link
Member

What's the test coverage? Do you need any help setting up the ngraph build test on MXNet CI?

@mbrookhart
Copy link
Author

Hi Eric,

We have a secondary test system working, but yeah, it would awesome to get some help setting up MXNet's CI, we haven't figured out exactly how that all works yet.

Matthew

@mbrookhart
Copy link
Author

@eric-haibin-lin I think we have the CI system properly building with nGraph, starting to look at unit tests now. We see a few tests fail because the Subgraph API changes the number of nodes in the graph. According to @zheng-da, this is expected behavior. Would we expect the CI job to fail if we enabled those tests?

@vandanavk
Copy link
Contributor

@marcoabreu for review and inputs

@ashokei ashokei force-pushed the ngraph_initial_upstream branch 2 times, most recently from c6e2149 to 82670da Compare September 26, 2018 02:39
@ashokei
Copy link
Contributor

ashokei commented Sep 26, 2018

@szha @anirudh2290 as code owner can you please review let us know feedback if any; CI and full unit-test are enabled.

Let us know anything we can do to expedite review/merge process.

include/mxnet/ndarray.h Outdated Show resolved Hide resolved
include/mxnet/ndarray.h Outdated Show resolved Hide resolved
src/c_api/c_api_symbolic.cc Show resolved Hide resolved
src/operator/contrib/ngraph-inl.h Outdated Show resolved Hide resolved
src/operator/contrib/ngraph-inl.h Show resolved Hide resolved
src/operator/contrib/ngraph.cc Outdated Show resolved Hide resolved
src/operator/subgraph/partition_graph.cc Show resolved Hide resolved
src/operator/subgraph/partition_graph.cc Show resolved Hide resolved
@zheng-da
Copy link
Contributor

@mbrookhart @ashokei sorry for the late review. are you going to provide tests? it seems your code isn't tested at all.

@mbrookhart
Copy link
Author

mbrookhart commented Sep 27, 2018

@zheng-da Thanks for the review! We added nGraph build and unit test jobs to Jenkins, they're running all of tests/python/unittest against nGraph through the subgraph API, aside from a few tests who's assumptions are broken by the Subgraph API.

@ashokei
Copy link
Contributor

ashokei commented Sep 27, 2018

thanks @zheng-da for detailed review. Since this change just reuses subgraph API, we want to mainly test mxnet unit-tests through ngraph with MXNET_SUBGRAPH_BACKEND. This PR does that, and tests/python/unittest in CI goes through mxnet -> ngraph.

Copy link
Contributor

@reminisce reminisce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you prepare an example showing how to use this feature in the folder example?

After building MXNet with nGraph support, users can enable nGraph backend by setting `MXNET_SUBGRAPH_BACKEND="ngraph"`environmental variable.

Gluon support is experimental and may or may not yield good performance. Gluon-NGraph
integration can be enabled by setting the environmental variable `MXNET_NGRAPH_GLUON=1`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is an additional evn var needed for Gluon? How different is it from MXNET_SUBGRAPH_BACKEND?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use MXNET_NGRAPH_GLUON env var for enabling/disabling mxnet imperative/gluon execution support for ngraph integration. This is not related(does not effect) to MXNET_SUBGRAPH_BACKEND.

NGRAPH_README.md Outdated Show resolved Hide resolved
NGRAPH_README.md Outdated Show resolved Hide resolved
NGRAPH_README.md Outdated Show resolved Hide resolved
src/executor/graph_executor.cc Show resolved Hide resolved
src/operator/contrib/ngraph-inl.h Show resolved Hide resolved
src/operator/contrib/ngraph-inl.h Outdated Show resolved Hide resolved
@mbrookhart
Copy link
Author

mbrookhart commented Sep 27, 2018

@reminisce Thanks for the review! What kind of examples would you like to see? After compiling with USE_NGRAPH=1, all Module-based examples running on cpu should transparently work with nGraph CPU unless the user selects a different subgraph backend with MXNET_SUBGRAPH_BACKEND. Future PRs with other backends will certainly include examples.

@anirudhacharya
Copy link
Member

@zheng-da @szha ping for another round of reviews :)

@eric-haibin-lin
Copy link
Member

@ashokei could you resolve conflicts? Thanks

@ashokei
Copy link
Contributor

ashokei commented Mar 4, 2019

thanks @anirudhacharya .
@eric-haibin-lin we fixed conflicts. @zheng-da @szha .

@szha szha dismissed their stale review March 6, 2019 05:55

concerns addressed for now

Copy link
Member

@szha szha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the plan for the bridge code?

@louisfeng
Copy link

@szha Our bridge code can be treated as 3rd party submodule and is publicly open source: https://github.com/NervanaSystems/ngraph-mxnet-bridge

@karan6181
Copy link
Contributor

@mbrookhart @ashokei Thank you for your contribution! can you please address the review comments? Thank you!

@eric-haibin-lin
Copy link
Member

@szha any other concerns?

@piyushghai
Copy link
Contributor

@mbrookhart Pinging again. Can you rebase with master branch and address the review comments ?

@louisfeng
Copy link

louisfeng commented Apr 11, 2019

@mbrookhart Pinging again. Can you rebase with master branch and address the review comments ?

We have followed up with @szha a few weeks ago on the slack channel and waiting for him to approve this PR.

@Roshrini
Copy link
Member

@szha Can you take a look at this PR? @mbrookhart Can you please resolve conflicts?

@roywei
Copy link
Member

roywei commented Apr 29, 2019

@mxnet-label-bot add [pr-awaiting-response]

@marcoabreu marcoabreu added the pr-awaiting-response PR is reviewed and waiting for contributor to respond label Apr 29, 2019
@vandanavk
Copy link
Contributor

@szha is this PR good to go? @mbrookhart could you resolve the merge conflicts?

@louisfeng
Copy link

louisfeng commented May 8, 2019

@szha is this PR good to go? @mbrookhart could you resolve the merge conflicts?

We have resolved the merge conflicts a few times previously. It probably makes sense to resolve the conflicts before the merge once the PR is approved. Otherwise we need to continuously fixing the conflicts.

@karan6181
Copy link
Contributor

@szha , @anirudh2290 Could you please review this PR again? @louisfeng waiting for the response. Thanks!

@abhinavs95
Copy link
Contributor

@szha Could you please take a look again? Thanks.

@piyushghai
Copy link
Contributor

@mbrookhart Can you rebase with the latest master ?

@szha @anirudh2290 Bouncing off for review...

@Roshrini
Copy link
Member

@szha @anirudh2290 Ping again for review

@karan6181
Copy link
Contributor

@mbrookhart Could you please resolve the merge conflicts and @szha , @anirudh2290 will review it once the PR is updated!! Thanks!

@piyushghai
Copy link
Contributor

@mbrookhart Could you please resolve the merge conflicts and @szha , @anirudh2290 will review it once the PR is updated!! Thanks!

@mbrookhart mbrookhart closed this Aug 7, 2019

const SubgraphPropertyPtr& subg_prop = g->GetAttr<SubgraphPropertyPtr>("subgraph_property");
nnvm::NodePtr n;
if (!subg_prop->NeedGraphAttrs()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this cause any overhead when ngraph isn’t enabled?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-response PR is reviewed and waiting for contributor to respond pr-awaiting-testing PR is reviewed and waiting CI build and test
Projects
None yet
Development

Successfully merging this pull request may close these issues.