Update TensorRT tutorial to build-from-source. #14860

KellenSunderland · 2019-05-02T17:51:22Z

Description

Update TensorRT tutorial to build-from-source.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
[] Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

KellenSunderland · 2019-05-02T17:58:56Z

FYI @Caenorst @aaronmarkham

vandanavk · 2019-05-09T05:26:49Z

@mxnet-label-bot add [pr-work-in-progress]

aaronmarkham

A few suggestions...

docs/tutorials/tensorrt/inference_with_trt.md

aaronmarkham · 2019-05-14T23:43:04Z

docs/tutorials/tensorrt/inference_with_trt.md

-```
-
-If you are running an operating system other than Ubuntu 16.04, or just prefer to use a docker image with all prerequisites installed you can instead run:
+If you are running an operating system other than Ubuntu 18.04, or just prefer to use a docker image with all prerequisites installed you can instead run:


I think this is missing a pull step.

Run will do a pull.

I think this is still missing.

aaronmarkham · 2019-05-14T23:47:44Z

docs/tutorials/tensorrt/inference_with_trt.md

-For this experiment we are strictly interested in inference performance, so to simplify the benchmark we'll pass a tensor filled with zeros as an input.  We then bind a symbol as usual, returning a normal MXNet executor, and we run forward on this executor in a loop.  To help improve the accuracy of our benchmarks we run a small number of predictions as a warmup before running our timed loop.  This will ensure various lazy operations, which do not represent real-world usage, have completed before we measure relative performance improvement.  On a modern PC with a Titan V GPU the time taken for our MXNet baseline is **33.73s**.  Next we'll run the same model with TensorRT enabled, and see how the performance compares.
-
-While TensorRT integration remains experimental, we require users to set an environment variable to enable graph compilation.  You can see that at the start of this test we explicitly disabled TensorRT graph compilation support.  Next, we will run the same predictions using TensorRT.  This will require us to explicitly enable the MXNET_USE_TENSORRT environment variable, and we'll also use a slightly different API to bind our symbol.
+For this experiment we are strictly interested in inference performance, so to simplify the benchmark we'll pass a tensor filled with zeros as an input.  We then bind a symbol as usual, returning a normal MXNet executor, and we run forward on this executor in a loop.  To help improve the accuracy of our benchmarks we run a small number of predictions as a warmup before running our timed loop.  This will ensure various lazy operations, which do not represent real-world usage, have completed before we measure relative performance improvement.  On a modern PC with an RTX 2070 GPU the time taken for our MXNet baseline is **17.20s**.  Next we'll run the same model with TensorRT enabled, and see how the performance compares.


I think this could be simplified. Do you have to include so much detail for a toy model?

Suggested change

For this experiment we are strictly interested in inference performance, so to simplify the benchmark we'll pass a tensor filled with zeros as an input. We then bind a symbol as usual, returning a normal MXNet executor, and we run forward on this executor in a loop. To help improve the accuracy of our benchmarks we run a small number of predictions as a warmup before running our timed loop. This will ensure various lazy operations, which do not represent real-world usage, have completed before we measure relative performance improvement. On a modern PC with an RTX 2070 GPU the time taken for our MXNet baseline is **17.20s**. Next we'll run the same model with TensorRT enabled, and see how the performance compares.

For this experiment we are strictly interested in inference performance, so to simplify the benchmark we'll pass a tensor filled with zeros as an input. We will also ensure various lazy operations are excluded from the benchmark by performing a warmup before running our timed loop. This does not represent real-world usage, but will provide a basic benchmark. On a modern PC with an RTX 2070 GPU the time taken for our MXNet baseline is **17.20s**. Next we'll run the same model with TensorRT enabled, and see how the performance compares.

Good feedback, I'll try and simplify.

abhinavs95 · 2019-05-30T21:05:30Z

@KellenSunderland Could you have a look at the review comments? Thanks!

vandanavk · 2019-06-16T19:15:54Z

@mxnet-label-bot update [pr-work-in-progress]

samskalicky · 2019-08-13T16:12:40Z

docs/tutorials/tensorrt/inference_with_trt.md

-Instead of calling simple_bind directly on our symbol to return an executor, we call an experimental API from the contrib module of MXNet. This call is meant to emulate the simple_bind call, and has many of the same arguments.  One difference to note is that this call takes params in the form of a single merged dictionary to assist with a tensor cleanup pass that we'll describe below.
-
-As TensorRT integration improves our goal is to gradually deprecate this tensorrt_bind call, and allow users to use TensorRT transparently (see the [Subgraph API](https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimization+and+Quantization+based+on+subgraph+and+MKL-DNN) for more information).  When this happens, the similarity between tensorrt_bind and simple_bind should make it easy to migrate your code.
+We us a few TensorRT specific API calls from the contrib package here to setup our parameters and indicate we'd like to run inference in fp16 mode. We then call simple_bind as normal and copy our parameter dictionaries to our executor.


'We us' ==> ?

samskalicky · 2019-08-13T16:15:29Z

docs/tutorials/tensorrt/inference_with_trt.md

-## Future Work
-As mentioned above, MXNet developers are excited about the possibilities of [creating APIs](https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimization+and+Quantization+based+on+subgraph+and+MKL-DNN) that deal specifically with subgraphs.  As this work matures it will bring many improvements for TensorRT users.  We hope this will also be an opportunity for other acceleration libraries to integrate with MXNet.
+## Subgraph API
+As of MXNet 1.5, MXNet developers have integrated TensorRT with MXNet via a Subgraph API.  Read more about the design of the API [here](https://cwiki.apache.org/confluence/display/MXNET/MXNet+Graph+Optimization+and+Quantization+based+on+subgraph+and+MKL-DNN).

 ## Thanks
 Thank you to NVIDIA for contributing this feature, and specifically thanks to Marek Kolodziej and Clement Fuji-Tsang.  Thanks to Junyuan Xie and Jun Wu for the code reviews and design feedback, and to Aaron Markham for the copy review.


'Thank you to NVIDIA' ==> Thanks to NVIDIA

Caenorst · 2019-08-14T16:57:08Z

docs/tutorials/tensorrt/inference_with_trt.md

-executor = mx.contrib.tensorrt.tensorrt_bind(sym, ctx=mx.gpu(0), all_params=all_params,
-                                             data=batch_shape, grad_req='null', force_rebind=True)
+trt_sym = sym.get_backend_symbol('TensorRT')
+mx.contrib.tensorrt.init_tensorrt_params(trt_sym, arg_params, aux_params)


The inputs arg_params and aux_params being modified by init_tensorrt_params is actually an unwanted behavior that I'm intending to fix, please use the returned arg_params / aux_params.

Gotcha, will do. Thanks @Caenorst

aaronmarkham · 2019-08-26T16:30:08Z

@KellenSunderland Is this PR good to go now?

KellenSunderland · 2019-08-30T04:18:39Z

Should be ok to go now, would appreciate a review @aaronmarkham.

aaronmarkham

Minor URL updates needed... and maybe a docker pull line for clarity...

aaronmarkham · 2019-08-30T18:05:44Z

docs/tutorials/tensorrt/inference_with_trt.md

-```
-
-If you are running an operating system other than Ubuntu 16.04, or just prefer to use a docker image with all prerequisites installed you can instead run:
+If you are running an operating system other than Ubuntu 18.04, or just prefer to use a docker image with all prerequisites installed you can instead run:


I think this is still missing.

aaronmarkham · 2019-08-30T18:06:35Z

docs/tutorials/tensorrt/inference_with_trt.md

 ```
 nvidia-docker run -ti mxnet/tensorrt python
 ```

 ## Sample Models
 ### Resnet 18
-TensorRT is an inference only library, so for the purposes of this blog post we will be using a pre-trained network, in this case a Resnet 18.  Resnets are a computationally intensive model architecture that are often used as a backbone for various computer vision tasks. Resnets are also commonly used as a reference for benchmarking deep learning library performance.  In this section we'll use a pretrained Resnet 18 from the [Gluon Model Zoo](https://mxnet.incubator.apache.org/versions/master/api/python/gluon/model_zoo.html) and compare its inference speed with TensorRT using MXNet with TensorRT integration turned off as a baseline.
+TensorRT is an inference only library, so for the purposes of this tutorial we will be using a pre-trained network, in this case a Resnet 18.  Resnets are a computationally intensive model architecture that are often used as a backbone for various computer vision tasks. Resnets are also commonly used as a reference for benchmarking deep learning library performance.  In this section we'll use a pretrained Resnet 18 from the [Gluon Model Zoo](https://mxnet.incubator.apache.org/versions/master/api/python/gluon/model_zoo.html) and compare its inference speed with TensorRT using MXNet with TensorRT integration turned off as a baseline.


Can you use a relative link instead?

aaronmarkham · 2019-08-30T18:07:18Z

docs/tutorials/tensorrt/inference_with_trt.md

@@ -118,7 +108,7 @@ for i in range(0, 10000):
 end = time.time()
 print(time.process_time() - start)
 ```
-We run timing with a warmup once more, and on the same machine, run in **18.99s**. A 1.8x speed improvement!  Speed improvements when using libraries like TensorRT can come from a variety of optimizations, but in this case our speedups are coming from a technique known as [operator fusion](http://dmlc.ml/2016/11/21/fusion-and-runtime-compilation-for-nnvm-and-tinyflow.html).
+We run timing with a warmup once more, and on the same machine, run in **9.83s**. A 1.75x speed improvement!  Speed improvements when using libraries like TensorRT can come from a variety of optimizations, but in this case our speedups are coming from a technique known as [operator fusion](http://dmlc.ml/2016/11/21/fusion-and-runtime-compilation-for-nnvm-and-tinyflow.html).


I think dlmc.ml is gone...

Darn, that was a good guide. Will update.

ptrendx · 2019-10-22T19:50:02Z

@KellenSunderland What is the status of this PR? You listed better documentation for MXNet-TRT as 1 of the things todo for 1.6 release, which has a code freeze this week.

KellenSunderland · 2019-11-01T17:02:25Z

Hey @ptrendx. Sorry given we haven't setup CD for this feature and we're behind on releases I'd recommend we remove this tutorial for the time being. It's asking a little too much of our users to build mxnet with this feature supported correctly IMO. I'll have a look at how CD is setup and see if we can auto-build binaries, then re-add this tutorial when the docker and python packages are up-to-date.

ptrendx · 2019-11-01T17:29:17Z

@aaronmarkham Could you please review this tutorial again? We would like to include it in 1.6 release.

KellenSunderland · 2019-11-27T17:12:55Z

Friendly ping @aaronmarkham.

samskalicky

LGTM

KellenSunderland requested a review from szha as a code owner May 2, 2019 17:51

KellenSunderland force-pushed the trt_tutorial branch from b3dfee0 to 5985d9c Compare May 2, 2019 17:56

KellenSunderland changed the title ~~[Review but do not merge before 1.5] Update TRT tutorial with new APIs~~ [Review, don't merge before 1.5] Update TRT tutorial with new APIs May 2, 2019

KellenSunderland added the CUDA label May 2, 2019

marcoabreu added the pr-work-in-progress PR is still work in progress label May 9, 2019

KellenSunderland added pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress labels May 11, 2019

aaronmarkham reviewed May 14, 2019

View reviewed changes

marcoabreu added pr-work-in-progress PR is still work in progress and removed CUDA pr-awaiting-review PR is waiting for code review labels Jun 16, 2019

KellenSunderland force-pushed the trt_tutorial branch 2 times, most recently from 7fd71ab to 2ca9b54 Compare August 13, 2019 05:10

KellenSunderland mentioned this pull request Aug 13, 2019

[Discussion] 1.5.1 Patch Release #15613

Closed

samskalicky reviewed Aug 13, 2019

View reviewed changes

KellenSunderland changed the title ~~[Review, don't merge before 1.5] Update TRT tutorial with new APIs~~ Update TRT tutorial with new APIs Aug 13, 2019

KellenSunderland force-pushed the trt_tutorial branch from 2ca9b54 to 1a22aad Compare August 13, 2019 16:21

Caenorst reviewed Aug 14, 2019

View reviewed changes

KellenSunderland force-pushed the trt_tutorial branch from 1a22aad to 2327c44 Compare August 30, 2019 04:18

KellenSunderland mentioned this pull request Aug 30, 2019

[v1.5.x] Update TRT tutorial with new APIs #16044

Merged

4 tasks

aaronmarkham suggested changes Aug 30, 2019

View reviewed changes

KellenSunderland force-pushed the trt_tutorial branch from 2327c44 to 464b72d Compare November 1, 2019 17:00

KellenSunderland changed the title ~~Update TRT tutorial with new APIs~~ Temporarily remove TRT tutorial. Nov 1, 2019

KellenSunderland force-pushed the trt_tutorial branch from 464b72d to 2a01d90 Compare November 1, 2019 17:26

KellenSunderland changed the title ~~Temporarily remove TRT tutorial.~~ Update TensorRT tutorial to build-from-source. Nov 1, 2019

ptrendx requested a review from aaronmarkham November 1, 2019 17:28

KellenSunderland added pr-awaiting-review PR is waiting for code review and removed pr-work-in-progress PR is still work in progress labels Nov 1, 2019

Update TensorRT tutorial to build-from-source.

2f93284

KellenSunderland force-pushed the trt_tutorial branch from 2a01d90 to 2f93284 Compare November 1, 2019 22:23

samskalicky approved these changes Nov 27, 2019

View reviewed changes

aaronmarkham approved these changes Nov 27, 2019

View reviewed changes

ptrendx merged commit 7713a43 into apache:master Nov 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update TensorRT tutorial to build-from-source. #14860

Update TensorRT tutorial to build-from-source. #14860

KellenSunderland commented May 2, 2019 •

edited

Loading

KellenSunderland commented May 2, 2019

vandanavk commented May 9, 2019

aaronmarkham left a comment

aaronmarkham May 14, 2019

KellenSunderland Jun 7, 2019

aaronmarkham Aug 30, 2019

aaronmarkham May 14, 2019

KellenSunderland Jun 7, 2019

abhinavs95 commented May 30, 2019

vandanavk commented Jun 16, 2019

samskalicky Aug 13, 2019

samskalicky Aug 13, 2019

Caenorst Aug 14, 2019

KellenSunderland Aug 14, 2019

KellenSunderland Aug 30, 2019

aaronmarkham commented Aug 26, 2019

KellenSunderland commented Aug 30, 2019

aaronmarkham left a comment

aaronmarkham Aug 30, 2019

aaronmarkham Aug 30, 2019

aaronmarkham Aug 30, 2019

KellenSunderland Aug 30, 2019

ptrendx commented Oct 22, 2019

KellenSunderland commented Nov 1, 2019

ptrendx commented Nov 1, 2019

KellenSunderland commented Nov 27, 2019

samskalicky left a comment

Update TensorRT tutorial to build-from-source. #14860

Update TensorRT tutorial to build-from-source. #14860

Conversation

KellenSunderland commented May 2, 2019 • edited Loading

Description

Checklist

Essentials

KellenSunderland commented May 2, 2019

vandanavk commented May 9, 2019

aaronmarkham left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

abhinavs95 commented May 30, 2019

vandanavk commented Jun 16, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aaronmarkham commented Aug 26, 2019

KellenSunderland commented Aug 30, 2019

aaronmarkham left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ptrendx commented Oct 22, 2019

KellenSunderland commented Nov 1, 2019

ptrendx commented Nov 1, 2019

KellenSunderland commented Nov 27, 2019

samskalicky left a comment

Choose a reason for hiding this comment

KellenSunderland commented May 2, 2019 •

edited

Loading