Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

dynamic custom operator support #15921

Merged
merged 139 commits into from
Dec 6, 2019
Merged

dynamic custom operator support #15921

merged 139 commits into from
Dec 6, 2019

Conversation

samskalicky
Copy link
Contributor

@samskalicky samskalicky commented Aug 15, 2019

Description

Enhancements to dynamic library loading to support custom operators in libraries.

  • added MXTensor/MXDtype structure, and versioning to lib_api.h
  • added similar NNVM register op-like capability for custom ops
  • operators are found in the library, and re-registered in MXNet during library loading
  • operators are re-registered from mx.nd.op to mx.nd and mx.sym.op to mx.sym shortcuts
  • Created a new example library in "example/lib_ops" with a GEMM operator

Initially, this project was proposed on the CWiki, however the design has evolved since the initial proposal. The current design is described below.

Design

The goal of this PR to to enable operators to be implemented in separate libraries and loaded at runtime

The main constraint is to maintain a low-level C-types only boundary between MXNet and the library to simplify the building and compiling of external libraries.

Working backwards from the user, users register operators with easy-to-use function prototypes like:

int myForward(std::map<std::string,std::string> attrs,
               std::vector<MXTensor> inputs, 
               std::vector<MXTensor> outputs);

Users' Forward (ie. FCompute) functions are called by a helper function _opCallFCompute that converts the C-types passed across the library boundary to the familiar STL types. This function is implemented in the lib_api.h header file that users compile with their library.

int _opCallFCompute(fcomp_t fcomp, const char* const* keys, const char* const* vals, int num,
                    const int64_t** inshapes, int* indims, void** indata, int* intypes, int num_in,
                    const int64_t** outshapes, int* outdims, void** outdata, int* outtypes, int num_out);

In MXNet's C API, the _opCallFCompute function is found in the library. A lambda function fcomp_conv is created for each operator loaded from the library to convert from MXNet-types to C-types. Then these C-types are passed to _opCallFCompute.

auto fcomp_conv = [=](const nnvm::NodeAttrs& attrs,
                      const OpContext& ctx,
                      const std::vector<TBlob>& inputs,
                      const std::vector<OpReqType>& req,
                      const std::vector<TBlob>& outputs);

The same design is used for: parseAttrs, inferShape, inferType, etc.

Finally, an operator is re-registered in MXNet with the lambda function like:

nnvm::Op &regOp = dmlc::Registry<nnvm::Op>::Get()->__REGISTER_OR_GET__(name);
regOp.set_attr<FCompute>("FCompute<cpu>",fcomp_conv);

Once the C API returns back to Python in the load function in library.py, we regenerate the Python bindings and re-register the operator shortcuts to mx.nd and mx.sym.

After the load function returns back to the user's Python code, they can then use their operators just like any other operate:

mx.nd.myCustomOp(A,B,C)

Current Features

  • custom CPU operators
  • stateless & stateful operators
  • custom subgraph operators
  • Memory resource request

Future/Next-steps

(to be done in a separate PR)

  • custom GPU operators
  • Random number generator resource request
  • sparse data types
  • migrate lambda functions in MXLoadLib in src/c_api/c_api.cc to classes defined elsewhere
  • Documentation, add the "library" python package to the namespace to the doc: https://mxnet.apache.org/api/python/docs/api/ ?

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • updated path to be absolute in example/lib_api/test.py
  • update the Makefile/CMakeLists.txt to build the new example in lib_ops instead of lib_api
  • moved library import in python/mxnet/init.py to after ndarray/symbol
  • added operator reregistration to python/mxnet/library.py
  • added operator discovery/registration in MXLoadLib in src/c_api/c_api.cc

Comments

@samskalicky
Copy link
Contributor Author

@wkcn while this PR is not quite done yet, it would be great to get some early feedback since the design/implementation has changed since our initial discussion. Let me know what you think, thanks!

Makefile Outdated Show resolved Hide resolved
include/mxnet/lib_api.h Outdated Show resolved Hide resolved
@rondogency
Copy link
Contributor

@wkcn 1.6 code freeze is tomorrow, so are you ok with this one not going into the 1.6 release? It is because none of us have time to maintain it until late November. After code freeze then we can merge it on Friday, and user can use nightly build to access this feature.

@wkcn
Copy link
Member

wkcn commented Oct 24, 2019

@rondogency No problem : )

@wkcn
Copy link
Member

wkcn commented Nov 28, 2019

Hi @samskalicky and @rondogency , is it ready to merge this PR after CI passes?

@samskalicky
Copy link
Contributor Author

Hi @samskalicky and @rondogency , is it ready to merge this PR after CI passes?

Yes! We're soooooo ready to merge :)

Thanks @zachgk for reruning the unix_cpu job!

@wkcn
Copy link
Member

wkcn commented Dec 6, 2019

I will merge this PR after the CI passes.
Thank all contributors!

@wkcn wkcn added the pr-awaiting-merge Review and CI is complete. Ready to Merge label Dec 6, 2019
@wkcn wkcn merged commit ae472c2 into apache:master Dec 6, 2019
@rondogency
Copy link
Contributor

@wkcn Big thank to Jackie for the merging work!


def check_platform():
return platform.machine() not in ['x86_64', 'AMD64']

@unittest.skipIf(check_platform(), "not all machine types supported")
@unittest.skipIf(is_cd_run(), "continuous delivery run - ignoring test")
def test_library_loading():
def test_custom_op():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has a strong assumption that the case will be called from mxnet root folder. Otherwise, the libsample_lib.so will not be found.

$ cd tests/python/unittest/
$ nosetests -v test_extensions:test_custom_op
test_extensions.test_custom_op ... ERROR

======================================================================
ERROR: test_extensions.test_custom_op
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/lvtao/miniconda3/envs/mxnet/lib/python3.6/site-packages/nose/case.py", line 198, in runTest
    self.test(*self.arg)
  File "/home/lvtao/Workspace/mxnet-official/tests/python/unittest/test_extensions.py", line 41, in test_custom_op
    raise MXNetError("library %s not found " % lib)
mxnet.base.MXNetError: library libsample_lib.so not found

----------------------------------------------------------------------
Ran 1 test in 0.005s

FAILED (errors=1)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha I will fix it in the next PR

@samskalicky samskalicky mentioned this pull request Dec 13, 2019
4 tasks
leezu pushed a commit that referenced this pull request Apr 8, 2020
Add random number generator support for custom operator libraries.

Design: We pass from MXNet the initialized and seeded states, located on CPU and GPU, to custom library. So user could use those seeds to generate deterministic values from a given seed passed to MXNet. Basically this workflow:

mx.random.seed(128)
r1 = mx.nd.some_custom_random_op(data)
mx.random.seed(128)
r2 = mx.nd.some_custom_random_op(data)
assert (r1 == r2)

This PR does not let custom library generate exactly the same sequence of random numbers comparing to MXNet

This is a continuation of the custom operator project #15921 and #17270
samskalicky pushed a commit to samskalicky/incubator-mxnet that referenced this pull request Apr 15, 2020
Add random number generator support for custom operator libraries.

Design: We pass from MXNet the initialized and seeded states, located on CPU and GPU, to custom library. So user could use those seeds to generate deterministic values from a given seed passed to MXNet. Basically this workflow:

mx.random.seed(128)
r1 = mx.nd.some_custom_random_op(data)
mx.random.seed(128)
r2 = mx.nd.some_custom_random_op(data)
assert (r1 == r2)

This PR does not let custom library generate exactly the same sequence of random numbers comparing to MXNet

This is a continuation of the custom operator project apache#15921 and apache#17270
pengzhao-intel pushed a commit that referenced this pull request Apr 16, 2020
…18069)

* Dynamic subgraph compile support (#17623)

This PR adds support for passing the NDArrays from the existing optimize_for API down to the reviewSubgraph function in an external library. It also adds a new API for HybridBlock called optimize_for that can partition the model without running a forward pass.

Feature changes

    Adds new API to HybridBlock optimize_for that partitions the model but does not call the cachedOp
    Modifies the subgraph library example to optionally require args to be provided
    Adds annotation on subgraph inputs for the name of the original param so that inputs can be mapped and passes annotations to input nodes of subgraphs
    Adds support for tensors in MKLDNN format, calls Reorder2Default

New tests

    Adds a new test to partition operators that directly consume params
    add a new model to test where ops to be partitioned have args/params

Bug Fixes

    fixes bug in passing ids vector by value instead of by reference
    fixes bug in passing copies of attributes instead of by reference
    fixes bug where _cached_graph was not updated after partitioning
    fixes memory leak where user-specified attributes on subgraph ops were not freed if subgraph was rejected
    fixes problem incorrectly indexing into shape/dtype maps when annotating the graph

Docs

    Updates the README doc with the latest changes described above

* Adding sparse support to MXTensor for custom operators (#17569)

* Added enum for sparse storage

* Add structure for Dense and Sparse

* redesign the data structure for MXSparse

* pull out aux data from sparse NDArray

* Added more sparse arguments to API interface

* Passed sparse from c_api to lib_api.h and set in MXTensor

* Fix indent

* fix segfault

* Fix NDArray to MXTensor errors

* Add a sample of sparse(CSR) transpose

* Make CSR transpose temporarily work by hardcoding

* Fixed sparse output size(Refined)

* Add tests for symbolic and stateful ops

* Added a sample for row sparse transpose

* Added real row sparse transpose

* Fix output size issue by adding lambda for CheckAndAlloc()

* Fix mixed storage formats error

* Added infer storage type function

* resolve comments

* Set inferSType as optional function

* Resolve comments

* Add error messages

* Resolve comments

* verify transpose ops results

* fix sanity check

* update MX_LIBRARY_VERSION to 5

* Custom Operator Random Number Generator Support (#17762)

Add random number generator support for custom operator libraries.

Design: We pass from MXNet the initialized and seeded states, located on CPU and GPU, to custom library. So user could use those seeds to generate deterministic values from a given seed passed to MXNet. Basically this workflow:

mx.random.seed(128)
r1 = mx.nd.some_custom_random_op(data)
mx.random.seed(128)
r2 = mx.nd.some_custom_random_op(data)
assert (r1 == r2)

This PR does not let custom library generate exactly the same sequence of random numbers comparing to MXNet

This is a continuation of the custom operator project #15921 and #17270

Co-authored-by: guanxinq <[email protected]>
Co-authored-by: Ziyi Mu <[email protected]>
pengzhao-intel pushed a commit that referenced this pull request Apr 16, 2020
* Dynamic subgraph compile support (#17623)

This PR adds support for passing the NDArrays from the existing optimize_for API down to the reviewSubgraph function in an external library. It also adds a new API for HybridBlock called optimize_for that can partition the model without running a forward pass.

Feature changes

    Adds new API to HybridBlock optimize_for that partitions the model but does not call the cachedOp
    Modifies the subgraph library example to optionally require args to be provided
    Adds annotation on subgraph inputs for the name of the original param so that inputs can be mapped and passes annotations to input nodes of subgraphs
    Adds support for tensors in MKLDNN format, calls Reorder2Default

New tests

    Adds a new test to partition operators that directly consume params
    add a new model to test where ops to be partitioned have args/params

Bug Fixes

    fixes bug in passing ids vector by value instead of by reference
    fixes bug in passing copies of attributes instead of by reference
    fixes bug where _cached_graph was not updated after partitioning
    fixes memory leak where user-specified attributes on subgraph ops were not freed if subgraph was rejected
    fixes problem incorrectly indexing into shape/dtype maps when annotating the graph

Docs

    Updates the README doc with the latest changes described above

* Adding sparse support to MXTensor for custom operators (#17569)

* Added enum for sparse storage

* Add structure for Dense and Sparse

* redesign the data structure for MXSparse

* pull out aux data from sparse NDArray

* Added more sparse arguments to API interface

* Passed sparse from c_api to lib_api.h and set in MXTensor

* Fix indent

* fix segfault

* Fix NDArray to MXTensor errors

* Add a sample of sparse(CSR) transpose

* Make CSR transpose temporarily work by hardcoding

* Fixed sparse output size(Refined)

* Add tests for symbolic and stateful ops

* Added a sample for row sparse transpose

* Added real row sparse transpose

* Fix output size issue by adding lambda for CheckAndAlloc()

* Fix mixed storage formats error

* Added infer storage type function

* resolve comments

* Set inferSType as optional function

* Resolve comments

* Add error messages

* Resolve comments

* verify transpose ops results

* fix sanity check

* update MX_LIBRARY_VERSION to 5

* Custom Operator Random Number Generator Support (#17762)

Add random number generator support for custom operator libraries.

Design: We pass from MXNet the initialized and seeded states, located on CPU and GPU, to custom library. So user could use those seeds to generate deterministic values from a given seed passed to MXNet. Basically this workflow:

mx.random.seed(128)
r1 = mx.nd.some_custom_random_op(data)
mx.random.seed(128)
r2 = mx.nd.some_custom_random_op(data)
assert (r1 == r2)

This PR does not let custom library generate exactly the same sequence of random numbers comparing to MXNet

This is a continuation of the custom operator project #15921 and #17270

Co-authored-by: guanxinq <[email protected]>
Co-authored-by: Ziyi Mu <[email protected]>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Operator pr-awaiting-merge Review and CI is complete. Ready to Merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants