This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
rondogency
requested review from
aaronmarkham,
anirudh2290,
eric-haibin-lin and
szha
as code owners
January 11, 2020 00:35
eric-haibin-lin
suggested changes
Jan 13, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some interfaces require more discussion.
samskalicky
reviewed
Jan 16, 2020
samskalicky
reviewed
Jan 16, 2020
samskalicky
reviewed
Jan 16, 2020
samskalicky
reviewed
Jan 16, 2020
samskalicky
reviewed
Jan 16, 2020
samskalicky
reviewed
Jan 16, 2020
samskalicky
reviewed
Jan 16, 2020
samskalicky
reviewed
Jan 16, 2020
samskalicky
reviewed
Jan 16, 2020
samskalicky
reviewed
Jan 16, 2020
samskalicky
reviewed
Jan 17, 2020
samskalicky
reviewed
Jan 17, 2020
samskalicky
reviewed
Jan 21, 2020
samskalicky
reviewed
Jan 21, 2020
samskalicky
reviewed
Jan 21, 2020
@ptrendx thanks for your comments! I have resolved those comments, and I will appreciate if you could take another quick look and approve this. |
samskalicky
reviewed
Jan 28, 2020
@rondogency looks like the windows build/test is working now with those cmake changes:
Now we just need to work through the flaky tests |
samskalicky
reviewed
Jan 28, 2020
ptrendx
reviewed
Jan 29, 2020
ptrendx
reviewed
Jan 29, 2020
ptrendx
reviewed
Jan 29, 2020
wkcn
reviewed
Jan 29, 2020
ptrendx
approved these changes
Jan 30, 2020
wkcn
reviewed
Jan 31, 2020
wkcn
approved these changes
Jan 31, 2020
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! LGTM. Thank you!
wkcn
added
pr-awaiting-merge
Review and CI is complete. Ready to Merge
and removed
pr-awaiting-review
PR is waiting for code review
labels
Jan 31, 2020
@wkcn Thanks! Can you help me to merge it? |
Merged. : ) |
7 tasks
7 tasks
zheyuye
pushed a commit
to zheyuye/incubator-mxnet
that referenced
this pull request
Feb 19, 2020
* poc gpu customop end to end * add backward and device id * clear up customop makefile * new fcomp register * new setforward to pass custom context to c_api * resolve sam comment: add cond register and fix setforward char * tmp stateful op * passing ctx of stateful op * add gpu alloc and refactor all fcomp * resolve sam comments and refactor alloc * add gpu check to pass cpu build * add unittest and resolve ptrend comments * add cmake and jenkins * fix windows * windows gpu cmake build fix * remove verbose
leezu
pushed a commit
that referenced
this pull request
Apr 8, 2020
Add random number generator support for custom operator libraries. Design: We pass from MXNet the initialized and seeded states, located on CPU and GPU, to custom library. So user could use those seeds to generate deterministic values from a given seed passed to MXNet. Basically this workflow: mx.random.seed(128) r1 = mx.nd.some_custom_random_op(data) mx.random.seed(128) r2 = mx.nd.some_custom_random_op(data) assert (r1 == r2) This PR does not let custom library generate exactly the same sequence of random numbers comparing to MXNet This is a continuation of the custom operator project #15921 and #17270
samskalicky
pushed a commit
to samskalicky/incubator-mxnet
that referenced
this pull request
Apr 15, 2020
Add random number generator support for custom operator libraries. Design: We pass from MXNet the initialized and seeded states, located on CPU and GPU, to custom library. So user could use those seeds to generate deterministic values from a given seed passed to MXNet. Basically this workflow: mx.random.seed(128) r1 = mx.nd.some_custom_random_op(data) mx.random.seed(128) r2 = mx.nd.some_custom_random_op(data) assert (r1 == r2) This PR does not let custom library generate exactly the same sequence of random numbers comparing to MXNet This is a continuation of the custom operator project apache#15921 and apache#17270
pengzhao-intel
pushed a commit
that referenced
this pull request
Apr 16, 2020
…18069) * Dynamic subgraph compile support (#17623) This PR adds support for passing the NDArrays from the existing optimize_for API down to the reviewSubgraph function in an external library. It also adds a new API for HybridBlock called optimize_for that can partition the model without running a forward pass. Feature changes Adds new API to HybridBlock optimize_for that partitions the model but does not call the cachedOp Modifies the subgraph library example to optionally require args to be provided Adds annotation on subgraph inputs for the name of the original param so that inputs can be mapped and passes annotations to input nodes of subgraphs Adds support for tensors in MKLDNN format, calls Reorder2Default New tests Adds a new test to partition operators that directly consume params add a new model to test where ops to be partitioned have args/params Bug Fixes fixes bug in passing ids vector by value instead of by reference fixes bug in passing copies of attributes instead of by reference fixes bug where _cached_graph was not updated after partitioning fixes memory leak where user-specified attributes on subgraph ops were not freed if subgraph was rejected fixes problem incorrectly indexing into shape/dtype maps when annotating the graph Docs Updates the README doc with the latest changes described above * Adding sparse support to MXTensor for custom operators (#17569) * Added enum for sparse storage * Add structure for Dense and Sparse * redesign the data structure for MXSparse * pull out aux data from sparse NDArray * Added more sparse arguments to API interface * Passed sparse from c_api to lib_api.h and set in MXTensor * Fix indent * fix segfault * Fix NDArray to MXTensor errors * Add a sample of sparse(CSR) transpose * Make CSR transpose temporarily work by hardcoding * Fixed sparse output size(Refined) * Add tests for symbolic and stateful ops * Added a sample for row sparse transpose * Added real row sparse transpose * Fix output size issue by adding lambda for CheckAndAlloc() * Fix mixed storage formats error * Added infer storage type function * resolve comments * Set inferSType as optional function * Resolve comments * Add error messages * Resolve comments * verify transpose ops results * fix sanity check * update MX_LIBRARY_VERSION to 5 * Custom Operator Random Number Generator Support (#17762) Add random number generator support for custom operator libraries. Design: We pass from MXNet the initialized and seeded states, located on CPU and GPU, to custom library. So user could use those seeds to generate deterministic values from a given seed passed to MXNet. Basically this workflow: mx.random.seed(128) r1 = mx.nd.some_custom_random_op(data) mx.random.seed(128) r2 = mx.nd.some_custom_random_op(data) assert (r1 == r2) This PR does not let custom library generate exactly the same sequence of random numbers comparing to MXNet This is a continuation of the custom operator project #15921 and #17270 Co-authored-by: guanxinq <[email protected]> Co-authored-by: Ziyi Mu <[email protected]>
pengzhao-intel
pushed a commit
that referenced
this pull request
Apr 16, 2020
* Dynamic subgraph compile support (#17623) This PR adds support for passing the NDArrays from the existing optimize_for API down to the reviewSubgraph function in an external library. It also adds a new API for HybridBlock called optimize_for that can partition the model without running a forward pass. Feature changes Adds new API to HybridBlock optimize_for that partitions the model but does not call the cachedOp Modifies the subgraph library example to optionally require args to be provided Adds annotation on subgraph inputs for the name of the original param so that inputs can be mapped and passes annotations to input nodes of subgraphs Adds support for tensors in MKLDNN format, calls Reorder2Default New tests Adds a new test to partition operators that directly consume params add a new model to test where ops to be partitioned have args/params Bug Fixes fixes bug in passing ids vector by value instead of by reference fixes bug in passing copies of attributes instead of by reference fixes bug where _cached_graph was not updated after partitioning fixes memory leak where user-specified attributes on subgraph ops were not freed if subgraph was rejected fixes problem incorrectly indexing into shape/dtype maps when annotating the graph Docs Updates the README doc with the latest changes described above * Adding sparse support to MXTensor for custom operators (#17569) * Added enum for sparse storage * Add structure for Dense and Sparse * redesign the data structure for MXSparse * pull out aux data from sparse NDArray * Added more sparse arguments to API interface * Passed sparse from c_api to lib_api.h and set in MXTensor * Fix indent * fix segfault * Fix NDArray to MXTensor errors * Add a sample of sparse(CSR) transpose * Make CSR transpose temporarily work by hardcoding * Fixed sparse output size(Refined) * Add tests for symbolic and stateful ops * Added a sample for row sparse transpose * Added real row sparse transpose * Fix output size issue by adding lambda for CheckAndAlloc() * Fix mixed storage formats error * Added infer storage type function * resolve comments * Set inferSType as optional function * Resolve comments * Add error messages * Resolve comments * verify transpose ops results * fix sanity check * update MX_LIBRARY_VERSION to 5 * Custom Operator Random Number Generator Support (#17762) Add random number generator support for custom operator libraries. Design: We pass from MXNet the initialized and seeded states, located on CPU and GPU, to custom library. So user could use those seeds to generate deterministic values from a given seed passed to MXNet. Basically this workflow: mx.random.seed(128) r1 = mx.nd.some_custom_random_op(data) mx.random.seed(128) r2 = mx.nd.some_custom_random_op(data) assert (r1 == r2) This PR does not let custom library generate exactly the same sequence of random numbers comparing to MXNet This is a continuation of the custom operator project #15921 and #17270 Co-authored-by: guanxinq <[email protected]> Co-authored-by: Ziyi Mu <[email protected]>
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Add GPU support for custom operators.
This is a continuation of custom operators project, initial CPU support is implemented here: #15921
Design
Working backward from the user. Here is an example of custom operator forward function for GPU.
Notice the function interface is the same as CPU operator. The input and output tensors are already in GPU, so you don't need to memcpy them to GPU.
You need the CUDA stream from OpResource in order to launch your CUDA kernel on the correct GPU, and mx_stream_t is defined as cudaStream_t when you compile the code with NVCC.
Then user will register a single operator with both CPU and GPU computation logic, by specifying the device type in register function. This registration works for any context as user only passes in a string.
Then user compiles the library similar to compiling the CPU operator library. Python usage is exactly the same as CPU custom operators
This PR is not
Checklist
Essentials
Changes
Comments