-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[MXNET-703] TensorRT runtime integration #11325
Changes from 8 commits
268e90b
4855d8a
419b294
8d723c9
ca94624
75c8642
2a11466
190c9bf
d88ad8b
87ebdce
83fa475
4a3772f
f779537
b0748ef
35e1367
6338e45
21d0239
f30dbef
eaba593
f870a3f
e40d6b3
7fa6a4a
e777ab5
3ea9b89
d6d2cac
2d04aee
449a195
ed36739
882d8e5
95a7955
0307467
8504319
55dd422
f7ff036
ec9d3ea
4b63738
694cbfb
2be7d25
369a3f7
64b7e95
1c7698b
74b6603
7fff80c
a754aab
7ea6ef9
22a3823
c3ace78
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
# -*- mode: dockerfile -*- | ||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
# | ||
# Dockerfile to run MXNet on Ubuntu 16.04 for CPU | ||
|
||
FROM nvidia/cuda:9.0-cudnn7-devel | ||
|
||
WORKDIR /work/deps | ||
|
||
COPY install/ubuntu_core.sh /work/ | ||
RUN /work/ubuntu_core.sh | ||
COPY install/deb_ubuntu_ccache.sh /work/ | ||
RUN /work/deb_ubuntu_ccache.sh | ||
COPY install/ubuntu_python.sh /work/ | ||
RUN /work/ubuntu_python.sh | ||
COPY install/tensorrt.sh /work | ||
RUN /work/tensorrt.sh | ||
|
||
ARG USER_ID=0 | ||
COPY install/ubuntu_adduser.sh /work/ | ||
RUN /work/ubuntu_adduser.sh | ||
|
||
COPY runtime_functions.sh /work/ | ||
|
||
WORKDIR /work/mxnet | ||
ENV LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/lib |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
#!/bin/bash | ||
|
||
# Licensed to the Apache Software Foundation (ASF) under one | ||
# or more contributor license agreements. See the NOTICE file | ||
# distributed with this work for additional information | ||
# regarding copyright ownership. The ASF licenses this file | ||
# to you under the Apache License, Version 2.0 (the | ||
# "License"); you may not use this file except in compliance | ||
# with the License. You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, | ||
# software distributed under the License is distributed on an | ||
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
# KIND, either express or implied. See the License for the | ||
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
# Install gluoncv since we're testing Gluon models as well | ||
pip2 install gluoncv==0.2.0 | ||
pip3 install gluoncv==0.2.0 | ||
|
||
# Install Protobuf | ||
# Install protoc 3.5 and build protobuf here (for onnx and onnx-tensorrt) | ||
pushd . | ||
cd .. | ||
apt-get update | ||
apt-get install -y automake libtool | ||
git clone --recursive -b 3.5.1.1 https://github.com/google/protobuf.git | ||
cd protobuf | ||
./autogen.sh | ||
./configure | ||
make -j$(nproc) | ||
make install | ||
ldconfig | ||
popd | ||
|
||
# Install TensorRT | ||
echo "TensorRT build enabled. Installing TensorRT." | ||
wget -qO tensorrt.deb https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/nvinfer-runtime-trt-repo-ubuntu1604-4.0.1-ga-cuda9.0_1-1_amd64.deb | ||
dpkg -i tensorrt.deb | ||
apt-get update | ||
apt-get install -y --allow-downgrades libnvinfer-dev | ||
rm tensorrt.deb |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1714,6 +1714,13 @@ MXNET_DLL int MXExecutorReshape(int partial_shaping, | |
NDArrayHandle** aux_states, | ||
ExecutorHandle shared_exec, | ||
ExecutorHandle *out); | ||
|
||
/*! | ||
* \brief get optimized graph from graph executor | ||
*/ | ||
MXNET_DLL int MXExecutorGetOptimizedSymbol(ExecutorHandle handle, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why do we need to expose this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @piiswrong See my reply to @eric-haibin-lin here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @piiswrong What is your take now that you have the context? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it's better to expose it as a private member of executor |
||
SymbolHandle *out); | ||
|
||
/*! | ||
* \brief set a call back to notify the completion of operation | ||
*/ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -152,14 +152,14 @@ class Executor { | |
static Executor* SimpleBind(nnvm::Symbol symbol, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This breaks API backward compatibility. You can define another simple bind function such as "SimpleBineEx" to achieve your purpose. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @reminisce: I'm not sure that I'd consider this a breaking API change. My understanding from what was discussed the dev list was that we would follow the hour-glass model for the c_api, and make the c_api the only native interface that we semantically version. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @KellenSunderland You got a valid point. My worry was that since this function is declared in the header placed in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @reminisce Yes I do see what you mean. It's certainly possible someone used the header. As long as it's not too much work would you mind creating a second function @mkolod? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we don't need to maintain compatibility on this. Its unlikely any user would depend on it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks @piiswrong ! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mkolod As we discussed offline about not breaking existing APIs, could you create a simple bind API for you to use only, rather than modifying this one? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why change reference to pointer? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @piiswrong I think we already discussed this here. The TensorRT graph pass does a graph rewrite, and that requires mutable types. Going from const ref to ref causes the linter to fail. The linter suggestion is to switch from const refs to pointers if mutability is required. You mentioned here that this was acceptable from an API stability point of view, because no changes in the C API are necessary, only in the C++ one. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @piiswrong The method that requires mutability is The call path here is SimpleBind() -> Init() -> ReinitGraph() (see here). This doesn't pose any breaking changes to the existing code base, as long as the C API is used to handle frontend bindings. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @piiswrong Also, I presume the lines in question are 155-162, not 152. Those are the ones changed by this commit, lines 152-154 and 163-168 are as authored on 2017-06-02 by @reminisce. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @piiswrong Thoughts about the above? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why not pass by value? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also I think it's better to name the functions as InitTensorRT rather than reinitgraph |
||
const Context& default_ctx, | ||
const std::map<std::string, Context>& group2ctx, | ||
const std::vector<Context>& in_arg_ctxes, | ||
const std::vector<Context>& arg_grad_ctxes, | ||
const std::vector<Context>& aux_state_ctxes, | ||
const std::unordered_map<std::string, TShape>& arg_shape_map, | ||
const std::unordered_map<std::string, int>& arg_dtype_map, | ||
const std::unordered_map<std::string, int>& arg_stype_map, | ||
const std::vector<OpReqType>& grad_req_types, | ||
const std::unordered_set<std::string>& param_names, | ||
std::vector<Context>* in_arg_ctxes, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Question: what is the purpose of modifying these arguments by passing pointers? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @reminisce Because if things are to be mutated, they need to be pointers, not non-const references (per the linter rules). Given your earlier comments about SimpleBindEx rather than modifying SimpleBind, this will be addressed there rather than modifying it here. |
||
std::vector<Context>* arg_grad_ctxes, | ||
std::vector<Context>* aux_state_ctxes, | ||
std::unordered_map<std::string, TShape>* arg_shape_map, | ||
std::unordered_map<std::string, int>* arg_dtype_map, | ||
std::unordered_map<std::string, int>* arg_stype_map, | ||
std::vector<OpReqType>* grad_req_types, | ||
std::unordered_set<std::string>* param_names, | ||
std::vector<NDArray>* in_args, | ||
std::vector<NDArray>* arg_grads, | ||
std::vector<NDArray>* aux_states, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -24,8 +24,9 @@ | |
import ctypes | ||
import copy | ||
import numpy as np | ||
import mxnet as mx | ||
from .base import _LIB | ||
from .base import mx_uint, NDArrayHandle, ExecutorHandle, py_str | ||
from .base import mx_uint, NDArrayHandle, ExecutorHandle, SymbolHandle, py_str | ||
from .base import check_call, c_handle_array, c_array_buf, c_str_array | ||
from .ndarray import NDArray | ||
from .ndarray import _ndarray_cls | ||
|
@@ -73,6 +74,7 @@ def __init__(self, handle, symbol, ctx, grad_req, group2ctx): | |
self.aux_arrays = [] | ||
self.outputs = self._get_outputs() | ||
self._symbol = copy.deepcopy(symbol) | ||
self._optimized_symbol = None | ||
self._arg_dict = None | ||
self._grad_dict = None | ||
self._aux_dict = None | ||
|
@@ -323,6 +325,21 @@ def output_dict(self): | |
self._symbol.list_outputs(), self.outputs) | ||
return self._output_dict | ||
|
||
@property | ||
def optimized_symbol(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When would an user want to access optimized_symbols? Is this added only for testing purpose? Does it make sense to keep it private? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @eric-haibin-lin For now this is added for testing only, but it's actually quite useful, and not just for the TensorRT integration, but potentially for other acceleration libraries in the future. Imagine wanting to visualize a model to figure out what the acceleration library did. Looking at the timeline may be helpful, but only if one knows which kernels belong to the accelerator, and which to MXNet or direct cuBLAS/cuDNN calls. Seeing how the graph got rewritten can help determine which layers could not be handled by the acceleration library, to determine what asks to make of libraries going forward. For example, TensorRT doesn't handle non-maximum suppression (NMS) directly, it can do that via a plugin. This initial integration doesn't handle TensorRT plugins. Knowing what didn't get subsumed into the accelerator graph node and comparing against a profile can determine whether for instance layers that are not being handled by the acceleration library subgraph are dominating the profile - a visual inspection makes this really quick. This could be useful for both the Graphviz integration, as well as potentially for TensorBoard. We tested this with GraphViz already, and it's working fine. Check out the attached visualization of LeNet-5. The graph on the left is the original MXNet one, and the one on the right is with all layers except for the input tensor having been replaced by a single TensorRT node. For a more complex network such as SSD, it would be visible that the NMS layer is a separate NNVM node, outside of the TensorRT subgraph. Basically, once you get the optimized symbol, you can also call
as you would for the original symbol. |
||
"""Get optimized symbol. | ||
|
||
Returns | ||
------- | ||
symbol : nnvm::Symbol | ||
The nnvm symbol optimized. | ||
""" | ||
if self._optimized_symbol is None: | ||
handle = SymbolHandle() | ||
check_call(_LIB.MXExecutorGetOptimizedSymbol(self.handle, ctypes.byref(handle))) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what happens if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That is a mistake indeed, _optimized_symbol is never modified it would never exist. Corrected |
||
self._optimized_symbol = mx.sym.Symbol(handle=handle) | ||
return self._optimized_symbol | ||
|
||
def copy_params_from(self, arg_params, aux_params=None, allow_extra_params=False): | ||
"""Copy parameters from arg_params, aux_params into executor's internal array. | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We spoke offline about this, but just a quick note that we should also add the ability to build MXNet-TensorRT integration to our cmake builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KellenSunderland I agree. Should the CMake build be part of the initial PR or a subsequent one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@KellenSunderland ping ^^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think either way would work.