-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Subgraph API for integrating accelerators with MXNet #12157
Subgraph API for integrating accelerators with MXNet #12157
Conversation
Thanks @reminisce it's really helpful for us :) @ZhennanQin @lvtao, please take a look the compatibility of current code with our interface :) |
nnvm::NodePtr n = nnvm::Node::Create(); | ||
n->attrs.op = Op::Get("_default_subgraph_op"); | ||
n->attrs.name = "_default_subgraph_op" + std::to_string(subgraph_id); | ||
n->attrs.subgraphs.push_back(std::make_shared<nnvm::Symbol>(sym)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
subgraph symbol is pushed back into n->attrs.subgraphs, how can we guarantee it is at index 0? Because for DefaultSubgraphOpNumInputs(), it always try to get symbol from index 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n
is a newly created node and its attrs.subgraphs
is empty before push_back
is called. It's actually up to you to decide which index is for the sym
in your specific subgraph property. Just make sure you get the correct subgraph in your subgraph operator implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I heard that control dependence will use attrs.subgraphs as well. I'm not sure how this attribute will be used for dependence purpose. Do we have any instruction to define the sequence for attrs.subgraphs?Eg. always put subgraph symbol in index 0, and following dependence purpose node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, there is no fixed convention of doing this. @zheng-da
I don't think you need to worry about this if you can ensure the consistency between your specific subgraph property and subgraph operator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can ensure the consistency between subgraph property and subgraph operator, and we also need to ensure nobody changes it after subgraph partition. It's better to add some check for this consistency in case it breaks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing you can do to enforce the consistency is defining some enums as indices of forward graph, backward graph, fused graph, etc in attrs.subgraphs
, and always use those enums to get the desired subgraphs. It also increases the code readability.
|
||
#define MXNET_REGISTER_SUBGRAPH_PROPERTY(Name, SubgraphPropertyType) \ | ||
static DMLC_ATTRIBUTE_UNUSED auto __make_ ## SubgraphPropertyType ## _ ## Name ## __ = \ | ||
SubgraphPropertyRegistry::Get()->__REGISTER__(#Name, &SubgraphPropertyType::Create) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we implement and call __REGISTER_OR_GET__ here (which checks if it is part of map else calls __REGISTER__). This may be needed because it is possible that __REGISTER__ may be called twice on the same name leading to exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is intended. The registration can only happen once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With __REGISTER_OR_GET__ also register will happen only once. To explain an use case where register wont work: lets say I create a custom subgraph property and call MXNET_REGISTER_SUBGRAPH_PROPERTY from inside the namespace. Now i include this custom subgraph property header file in multiple source files. If they are part of different translation units __REGISTER__ called multiple times and fails since the static variable scope is limited to a single translation unit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good catch. I will fix it. Thanks.
2f1434d
to
3c93b12
Compare
src/imperative/imperative_utils.h
Outdated
@@ -436,7 +436,8 @@ inline void PushFComputeEx(const FComputeEx& fn, | |||
} | |||
}; | |||
|
|||
if (exec_type == ExecType::kCrossDeviceCopy) { | |||
if (exec_type == ExecType::kCrossDeviceCopy | |||
|| exec_type == ExecType::kSubgraphExec) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need kSubgraphExec here. kSubgraphExec only works for stateful operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line is added by you. Nevertheless, isn't _default_subgraph_op
a stateful operator? This should be needed to make sure this operator executes in the main thread in CachedOp
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I added this line. It's actually not necessary because this is PushFComputeEx, which only pushes FComputeEx. could you help me remove the line?
src/ndarray/ndarray.cc
Outdated
@@ -39,6 +39,7 @@ | |||
#include "../operator/tensor/matrix_op-inl.h" | |||
#include "../operator/tensor/init_op.h" | |||
#include "../operator/nn/mkldnn/mkldnn_base-inl.h" | |||
#include "../engine/engine_impl.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to include this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
for (size_t i = 0; i < outputs.size(); ++i) { | ||
LOG(INFO) << "outputs[" << i << "].version = " << outputs[i].version(); | ||
} | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we clean this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, we are going to keep the debugging code for a while.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Removed now.
PrintNodeEntry(*entries[i]); | ||
} | ||
} | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete the debug code?
3c93b12
to
54971ba
Compare
src/engine/naive_engine.cc
Outdated
@@ -165,14 +178,22 @@ class NaiveEngine final : public Engine { | |||
} | |||
CHECK(this->req_completed_) | |||
<< "NaiveEngine only support synchronize Push so far"; | |||
// increment var version | |||
for (auto var : mutable_vars) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of deletevariable, will this increment the var version after it is deleted ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Should move this before exec_fun
runs.
++i1; | ||
} | ||
} | ||
return PartitionGraph(src, prop_name, arg_shapes, arg_dtypes, arg_stypes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we check for prop_name here. Currently only default is supported correct ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a check inside CreateSubgraphProperty.
} | ||
virtual SubgraphSelectorPtr CreateSubgraphSelector() const { | ||
return std::make_shared<ContainOpSelector>( | ||
this->GetAttr<std::unordered_set<std::string>>("op_names")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we override GetAttr for DefaultSubgraphProperty, so that it informs the user that the set has to be provided by the user using C API to test default subgraph property? This will also be a good example for developers writing custom subgraph property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a templated method. I don't think we can make it a virtual method. but I don't really know what you mean here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To explain my use case: Lets say a user forgot to call setattr for op_names when using the default_subgraph_property. It would fial with the error that op_names cannot be found in subgraph property. A better error message would have been useful. There may be such cases where classes subclassing SubgraphProperty may want to also implement logic for SetAttr and GetAttr in the subclass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not meant for users to use and is there only for the flexible testing purpose. Furthermore, template method cannot be virtual. But I still cannot understand your use case. How different could SetAttr and GetAttr be in a subclass than the base?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for example you only want certain attributes to be set for the subgraph property and you want to do some sanity check on whether the attrs can be supported or not. For example supported op_names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for your use case, i think it's fine without customization. Users of subgraph API can write their own code to do sanity check. The main problem is a templated method can't be virtual.
const std::map<std::string, Context>& ctx_map, | ||
const std::vector<Context>& in_arg_ctxes, | ||
const std::vector<Context>& aux_state_ctxes) { | ||
auto subgraph_prop = op::SubgraphPropertyRegistry::Get()->CreateSubgraphProperty(prop_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If prop_name
doesn't exist, we need to handle the failure of creating a subgraph property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a check inside CreateSubgraphProperty
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see.
++i1; | ||
} | ||
} | ||
return PartitionGraph(src, prop_name, arg_shapes, arg_dtypes, arg_stypes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
virtual SubgraphSelectorPtr CreateSubgraphSelector() const { | ||
return std::make_shared<ContainOpSelector>( | ||
this->GetAttr<std::unordered_set<std::string>>("op_names")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a templated method. I don't think we can make it a virtual method. but I don't really know what you mean here.
8435acf
to
964d6e2
Compare
Let's not put this in example since the CAPIs are not meant to be called by users |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
@@ -42,6 +43,7 @@ using namespace mxnet::common; | |||
GraphExecutor::GraphExecutor() { | |||
log_verbose_ = dmlc::GetEnv("MXNET_EXEC_VERBOSE_LOGGING", false); | |||
need_grad_ = false; | |||
subgraph_property_ = dmlc::GetEnv("MXNET_SUBGRAPH_BACKEND", std::string()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we also add MXNET_SUBGRAPH_BACKEND to env_var.md?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, there is no real accelerator integrated with MXNet. Intel team is going to submit their work on MKLDNN using this API set. We can add this evn var along with their PR.
return ret; | ||
} | ||
|
||
inline std::vector<ResourceRequest> DefaultSubgraphOpResourceRequest(const nnvm::NodeAttrs& attrs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function aggregates all resources requested by the subgraph. What are these acquired resources distributed to the individual operators in the subgraph?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a convenience function for developers to register with their own subgraph operators. They may or may not use this function for registration depending on their implementation. For the _defafult_subgraph_operator
, this is actually not necessary to be there. I have removed it from the registering _defafult_subgraph_operator
.
// Determine if the node should be selected for a subgraph. | ||
virtual bool Select(const nnvm::Node &n) = 0; | ||
// Determine if the input node should be selected for a subgraph. | ||
virtual bool SelectInput(const nnvm::Node &n, const nnvm::Node &new_node) = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you document what are n
and new_node
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work guys ! @reminisce @zheng-da
const size_t max_num_retry = simple_nodes.size() * simple_nodes.size(); | ||
size_t count = 0; | ||
bool success = false; | ||
while (!success && count < max_num_retry) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain the logic behind ssetting max_num_retry to square of simple_nodes.size(). Shouldnt it just be simple_nodes.size()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Theoretically, yes. But just want to be safe here.
} | ||
for (const auto& entry : top->inputs) { | ||
// when searching for the ancestor, the path cannot cross any subgraph node | ||
auto it = std::find(snodes.begin(), snodes.end(), entry.node.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe add check for it == snodes.end() since it wont happen that it will find the input node in the subgraph nodes ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either case is possible here, so cannot add the CHECK
.
a1aef90
to
3a722c4
Compare
LGTM |
Graph partitioner and subgraph op Fix duplicate entry bugs (apache#11767) Make subgraph var node name unique (apache#11876) [DO NOT REVIEW] Fix bug of eliminating cycles (apache#11907) * Fix cycle bug * Fix decycle bug * Fix comment [DO NOT REVIEW] Subgraph API (apache#12104) * Initial commit * Add unit tests * Fix lint * Fix lint * Clean up * Add graph partitiong to Bind * Add property name to graph partitioning c api * Fix unit test gpu context * Address cr * Move subgraph to attrs.subgraphs and fix the example * Fix lint * Add var version unit test * Address cr * Enable unit test that was flaky
3a722c4
to
ae2b480
Compare
Looks like all comments are addressed. Merging it now |
* Graph partitioner and subgraph op (apache#11251) Graph partitioner and subgraph op Fix duplicate entry bugs (apache#11767) Make subgraph var node name unique (apache#11876) [DO NOT REVIEW] Fix bug of eliminating cycles (apache#11907) * Fix cycle bug * Fix decycle bug * Fix comment [DO NOT REVIEW] Subgraph API (apache#12104) * Initial commit * Add unit tests * Fix lint * Fix lint * Clean up * Add graph partitiong to Bind * Add property name to graph partitioning c api * Fix unit test gpu context * Address cr * Move subgraph to attrs.subgraphs and fix the example * Fix lint * Add var version unit test * Address cr * Enable unit test that was flaky * Clean up * Clean up * Clean up * Change version return type in NDArray * Clean up * Add register or get for subgraph prop registry * Address cr * Remove unnecessary code * Handle var version issue in naive engine * Delete example * Remove registration of resource request for default subgraph op * Add doc string * Improve doc string
* Graph partitioner and subgraph op (apache#11251) Graph partitioner and subgraph op Fix duplicate entry bugs (apache#11767) Make subgraph var node name unique (apache#11876) [DO NOT REVIEW] Fix bug of eliminating cycles (apache#11907) * Fix cycle bug * Fix decycle bug * Fix comment [DO NOT REVIEW] Subgraph API (apache#12104) * Initial commit * Add unit tests * Fix lint * Fix lint * Clean up * Add graph partitiong to Bind * Add property name to graph partitioning c api * Fix unit test gpu context * Address cr * Move subgraph to attrs.subgraphs and fix the example * Fix lint * Add var version unit test * Address cr * Enable unit test that was flaky * Clean up * Clean up * Clean up * Change version return type in NDArray * Clean up * Add register or get for subgraph prop registry * Address cr * Remove unnecessary code * Handle var version issue in naive engine * Delete example * Remove registration of resource request for default subgraph op * Add doc string * Improve doc string
Description
This is a joint work of @reminisce and @zheng-da on implementing subgraph API for integrating backend accelerators (TVM, MKLDNN, TensorRT, nGraph, etc.) with MXNet. See design proposal.
Highlighted deliverables
A graph partitioning algorithm accepting custom partitioning rules (through subclassing
SubgraphProperty
).A naive graph partitioning rule called
DefaultSubgraphProperty
partitioning graphs based upon provided operator names. For example, in the left graph, operators B and C are designated to be grouped into a subgraph operator node. After applying the partitioning algorithm, nodes B and C are replaced by a subgraph operator node in the graph on the right-hand-side.DefaultSubgraphOperator
. A stateful operator storing a subgraph for execution byCachedOp
.NDArray
version number for tracking mutations ofNDArray
s. This is useful, for example, in a stateful MKLDNN subgraph operator for checking whether input data has changed since the last forward call to decide if to trigger data format conversion.Graph partitioning enabled in
GraphExecutor
for the forward pass. Triggered by an environment variableMXNET_SUBGRAPH_BACKEND
.Coming soon after this PR
DefaultSubgraphOperator
.Checklist
Essentials
Please feel free to remove inapplicable items for your PR.