Add unit tests for TensorRT integration and fix some bugs #15399

Caenorst · 2019-06-28T08:14:03Z

Description

TensorRT integration lacked of unit tests, we instead relied on output comparison of a full network which is not very pertinent, difficult to find the tolerance and not very helpful if it fails.

This PR have two purposes:

Add unit test for all operations:
As we only partition subgraph of at least 2 Ops we always append an identity to each output. we then compare to MXNet, using both TRT FP32 and FP16 computation.
Fix a bunch of edge cases bugs that have been exposed by the unit tests:

NHWC is currently not compatible with TensorRT
Pooling is currently not compatible with count_include_pad and only compatible with pooling_convention Valid
FullyConnected without bias cannot be converted to MatMul (as there should be a transpose before) for the moment we decided to remove compatibility, it is quite rare to do FullyConnected without bias anyway
Concat is not compatible if the concatenation axis is the batch axis
The dropout have to be only enabled on training mode (act as identity)
BatchNorm with fix_gamma can have gamma != 1. so we force the value to be 1. when loading it.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Comments

the test of output comparison in a full networks should probably be replaced by an accuracy comparison over a full dataset

anirudhacharya · 2019-06-28T18:03:22Z

@mxnet-label-bot add [pr-awaiting-review]

Caenorst · 2019-06-28T19:14:04Z

@KellenSunderland @haohuanw

KellenSunderland · 2019-07-11T01:52:24Z

Looks like CI caught a few issues. For example http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-15399/1/pipeline seems like it should be relevant to this PR. I'll have a look to see if there's anything else that jumps out at me.

KellenSunderland · 2019-07-11T01:56:57Z

src/operator/subgraph/tensorrt/nnvm_to_onnx.cc

@@ -157,6 +157,12 @@ std::string ConvertNnvmGraphToOnnx(
  return serialized_onnx_graph;
 }

+void ConvertIdentity(NodeProto* node_proto, const NodeAttrs& attrs,


Any idea if TRT actually optimizes this out? I've seen this in a few prod services :-/

I believe this should be optimized by ONNX-TRT

KellenSunderland · 2019-07-11T02:00:57Z

src/operator/subgraph/tensorrt/tensorrt-inl.h

+      return (param.dim != 0);
+    }
+
+    if (op_name == "Dropout") {


Again, will TensorRT optimize this out? We don't want it at inference time right?

Dropout have always been seen as identity function in MXNet-TensorRT integration so I don't see any changement on this, regarding to whether or not identity is actually doing a copy or not I'm not quite sure, here is the onnx-tensorrt conversion: https://github.com/onnx/onnx-tensorrt/blob/0ab159579551cabfa05fd66f338357f116e96835/trt_utils.hpp#L169-L180

Ok, non-blocking comment for this PR. I'm just thinking about adding a warning in the future if people are using TRT with operations that don't make sense at inference time (Dropout, Ident, Empty Concats or Copies, etc.)

src/operator/subgraph/tensorrt/tensorrt-inl.h

tests/python/gpu/test_tensorrt.py

KellenSunderland

Few small changes requested. Looks like CI caught a few issues as well.

karan6181 · 2019-07-19T01:07:08Z

@Caenorst Could you please address the review comments? Thanks!

piyushghai · 2019-08-02T19:50:00Z

@Caenorst Gentle ping...

…dCudaEngine, changer assert_allclose to assert_almost_equal

…unit test, remove test_tensorrt_deconvolution.py

Caenorst · 2019-10-07T22:19:08Z

I don't understand the error on windows-gpu, it doesn't seems related to my modifications...

Caenorst · 2019-10-14T13:12:50Z

@KellenSunderland can we merge it ? (I did a bunch of modifications since the last review that you may wanna review too)

Caenorst requested a review from szha as a code owner June 28, 2019 08:14

marcoabreu added the pr-awaiting-review PR is waiting for code review label Jun 28, 2019

KellenSunderland reviewed Jul 11, 2019

View reviewed changes

src/operator/subgraph/tensorrt/tensorrt-inl.h Outdated Show resolved Hide resolved

KellenSunderland reviewed Jul 11, 2019

View reviewed changes

tests/python/gpu/test_tensorrt.py Outdated Show resolved Hide resolved

KellenSunderland suggested changes Jul 11, 2019

View reviewed changes

Caenorst force-pushed the trt_unit_test branch from 8acedcd to d086543 Compare August 6, 2019 20:48

Caenorst force-pushed the trt_unit_test branch from d086543 to 05070fe Compare August 18, 2019 16:03

apeforest requested a review from KellenSunderland September 23, 2019 17:40

Caenorst added 5 commits October 1, 2019 17:27

add unit tests and fix a bunch of bugs

518f5bf

avoid init_tensorrt_params to modify inputs arg/aux_params

01f9da0

fix include in tensorrt-inl.h, fix bug following modification of buil…

0ce9e87

…dCudaEngine, changer assert_allclose to assert_almost_equal

remove line for lint

36e6fc7

update onnx-tensorrt

04a5764

Caenorst force-pushed the trt_unit_test branch from 07d5f5f to 04a5764 Compare October 2, 2019 00:27

Caenorst added 3 commits October 1, 2019 17:30

remove commented out, planned but not implemented work

063dbad

restrict compatibility of deconvolution for tensorrt and make proper …

064938d

…unit test, remove test_tensorrt_deconvolution.py

update test_tensorrt.py adding with_seed and fixing fp16 weights loading

96440fe

Caenorst requested review from aaronmarkham and marcoabreu as code owners October 2, 2019 18:52

Caenorst added 4 commits October 2, 2019 11:53

update libnvinfer to 5.1.5

f7e8e5c

fix typo

99575b2

get assert_allclose from numpy.testing

8f8d4d8

using NVIDIA machine learning repo instead of just libnvinfer

76bbcc7

Caenorst added 6 commits October 3, 2019 10:53

dereference trt_network for createParser

61f109c

fix for python2

fddac76

fix further for python2 syntax

5c8c601

move test_tensorrt.py to tests/python/tensorrt/

82565ce

update unit tests

ca332a5

update lenet5_train.py

2ebc4bd

Caenorst mentioned this pull request Oct 7, 2019

WIP: Update ONNX-TRT and TRT to version 6 #16386

Open

7 tasks

Caenorst added 2 commits October 7, 2019 08:33

remove redundant fix to batchnorm gamma correction

60cd171

update test_resnet18.py

cdd3b2e

KellenSunderland approved these changes Oct 20, 2019

View reviewed changes

KellenSunderland merged commit 746cbc5 into apache:master Oct 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add unit tests for TensorRT integration and fix some bugs #15399

Add unit tests for TensorRT integration and fix some bugs #15399

Caenorst commented Jun 28, 2019

anirudhacharya commented Jun 28, 2019

Caenorst commented Jun 28, 2019

KellenSunderland commented Jul 11, 2019

KellenSunderland Jul 11, 2019

Caenorst Oct 7, 2019

KellenSunderland Jul 11, 2019

Caenorst Aug 6, 2019

KellenSunderland Aug 6, 2019

KellenSunderland left a comment

karan6181 commented Jul 19, 2019

piyushghai commented Aug 2, 2019

Caenorst commented Oct 7, 2019

Caenorst commented Oct 14, 2019

Add unit tests for TensorRT integration and fix some bugs #15399

Add unit tests for TensorRT integration and fix some bugs #15399

Conversation

Caenorst commented Jun 28, 2019

Description

Checklist

Essentials

Comments

anirudhacharya commented Jun 28, 2019

Caenorst commented Jun 28, 2019

KellenSunderland commented Jul 11, 2019

KellenSunderland Jul 11, 2019

Choose a reason for hiding this comment

Caenorst Oct 7, 2019

Choose a reason for hiding this comment

KellenSunderland Jul 11, 2019

Choose a reason for hiding this comment

Caenorst Aug 6, 2019

Choose a reason for hiding this comment

KellenSunderland Aug 6, 2019

Choose a reason for hiding this comment

KellenSunderland left a comment

Choose a reason for hiding this comment

karan6181 commented Jul 19, 2019

piyushghai commented Aug 2, 2019

Caenorst commented Oct 7, 2019

Caenorst commented Oct 14, 2019