-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Conversation
…to dynamic_subgraph_prop
…to dynamic_subgraph_prop
added partitioner registration to c_api
…to dynamic_subgraph_prop
…to dynamic_subgraph_prop
example is printing
…to dynamic_subgraph_prop
@mseth10 @rondogency @szha @ptrendx @PatricZhao @TaoLv @ZhennanQin Initial PoC is working, would be great to get some early feedback before we go too far down the wrong road. Some more things todo:
|
…w subgraphProperties are grouped into subgraphBackends
Reviewed recent changes to allow library to add attributes to subgraph. Looks good to me. |
…to dynamic_subgraph_prop
Thanks for the review @eric-haibin-lin! ive made changes based on your feedback, updated the PR description with "Next Steps" for some todo items you suggested. |
You might want to have github issues or project to track the todos and the progress of this project. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some final nickpicks..
@mxnet-label-bot update [pr-awaiting-merge] |
apologies @eric-haibin-lin my mistake on the comments. Thanks for the suggestions, ive gone through and changed the variable names to use the underscore format in the custom_subgraph_property.h file. |
…to dynamic_subgraph_prop
Hi @samskalicky , is it ready to merge this PR? |
@wkcn ready to go! |
Merged. Thank you! |
Description
Initial PR for supporting dynamic loading of subgraph properties from libraries. This PR builds on the previous work for dynamic library loading (#15760), partition API enhancements (#15886), and dynamic custom operators (#15921). It enables partitioning a model using a partitioning strategy loaded from an external library.
The model's symbolic graph is analyzed at bind time to determine which operators should be partitioned into a subgraph. Then this info is kept in a custom SubgraphProperty class instance until partition time. Then the custom SubgraphProperty uses this info to guide partitioning of the model and insert the custom subgraph operator specified in the external library.
At runtime, the operators in the model will be executed normally, including the custom subgraph operator. At this time, the subgraph can be executed by the external library just like any regular operator. This provides an interface for parts of the model to be executed on custom accelerators without any change to MXNet's source code (specifically for each accelerator).
Design
Rather than provide a pure subgraph property interface to external libraries, we will provide a single API function that the user will implement to control the partitioning. The whole symbolic graph will be provided to the user -- post infer type & infer shape -- to analyze and determine which ops they want to include in subgraphs. They will return the node ID for each node in the graph that is supported. In this PR we implement a fixed custom SubgraphProperty in MXNet that will interface between the current Subgraph API and this streamlined "supportedOps" API in the external library.
Heres an end-to-end overview, starting at the Python users' end. First, a user will load their custom library containing a custom subgraph operator and the implemented "supportedOps" API.
Then they will call the "optimize_for" API (from #15886) that will use the custom subgraph property to partition the graph. The name here is the name the user specified when registering their "supportedOps" function in the external library.
This will then call the SubgraphProperty's "PrePartition" API and pass the whole model graph (post infer type/shape) to the custom subgraph backend/property.
https://github.com/apache/incubator-mxnet/blob/61013a8bf9ef8a7b79d684504df1b321b1efb8d8/src/c_api/c_api_symbolic.cc#L1305
In the custom subgraph property's "PrePartition" API, it will call the "supportedOps" API registered in the external library
Then the "supportedOps" API in the external library will be called. The symbol json string will be given as input, and any node that is supported by the external library will be set in a list of node IDs.
Then in the "PrePartition" function, these node IDs will be converted back into node names:
The supportedNodes vector is passed when creating the SubgraphSelector:
And then in the SubgraphSelector "Select" function is used to check if a given node should be included in the subgraph:
Finally, after the subgraph is created the operator used is the one specified by the user for their custom subgraph operator:
At runtime, the regular user operator is called resulting in the execution of the custom operator for the subgraph that was partitioned.
Users register their partitioning strategy using the following API where they specify the name of their subgraphBackend (and subgraphProperty), the supportedOps function, and the name of the custom operator they want inserted for each subgraph created:
Given that the MXNet subgraph API does de-cycle and other checks, the subgraph reviewed by the
supportedOps
API may differ than the final subgraph. To give the library an additional option to reject the final subgraph combination, we added an additional APIacceptSubgraph
that the library creator can implement. This is an optional API, and if implemented will be called from theCreateSubgraphNode
API in the MXNet subgraph property. If accepted, the subgraph op will be created and inserted into the graph. If not, we'll reattach the subgraph inputs to the graph (reversing the functionality of CutGraphInputs in build_subgraph.cc) returning the graph to its original state.This PR also sets an attribute for each subgraph input node "isArg" that is "True" if the subgraph input is also an input to the model (and not an output of some other operator in the model). This can be used by the accelerator library to avoid unnecessary data movement for the same data, or to execute further optimizations knowing that a particular subgraph input will be unchanged between calls.
Next Steps
Checklist
Essentials